The Journals of the Lewis and Clark Expedition Online project team created a prototype site using approximately 200 pages of content selected by editor Gary E. Moulton from volume 4 of the Journals. The team sought to identify and resolve issues related to use of a text markup language of sufficient robustness to capture important textual features found in the source documents. Items that needed to be addressed included naming the author of the entry, creating/attaching hyperlinks to footnotes, and allowing navigation between journal entries.
The project team selected eXtensible Markup Language (XML), a powerful and flexible text markup language created in 1998 by the World Wide Web Consortium. XML is a platform-independent and highly adaptable format that is well-suited to working with many kinds of data, including texts. The base tag set for the project markup was created by the Text Encoding Initiative (TEI) Consortium, an international group of humanities scholars and practitioners formed in 1987.
The team created a Document Type Definition (DTD) file using Pizza Chef, a DTD-authoring tool developed by TEI. As the project progressed, the file was extended to permit encoding of newly encountered textual features. All TEI files were checked for encoding errors using the DTD file, and corrections were made as needed to bring each file into full conformance to the parameters of the project.
Team members used NoteTab–a program that permits customizable automation of text markup, including XML and TEI–to author the TEI files. The team made extensive use of NoteTab's built-in clip programming language to automate routine tasks, such as making corrections to hundreds of files.
For the purposes of displaying TEI files in a web browser, the project team selected the eXtensible Stylesheet Language for Transformations (XSLT). XSLT was created by the World Wide Web Consortium in 1998 for the specific purpose of transforming XML files into other media for display and for other purposes. XSLT permits the mapping of XML features into HTML. XSLT is used to generate all aspects of web display, including web design, the appearance of fonts and images, and navigation.
When a grant was received from the National Endowment for the Humanities, the site was expanded to include the entire text of the Journals. Approximately five thousand pages of the Journals were encoded, each in conformance to XML and TEI standards.
An example of the power of using XML/XSLT for the project can be seen in the conflation of the Journal texts. In order to permit the chronological conflation of all the entries in the Journals, the team developed an apparatus within the XML markup that indicates the date of the current file, the date of the file immediately preceding the current file, and the date of the file immediately following the current file. The XSLT stylesheet then transforms this information into the "previous" and "next" links found on the top and bottom of every page. The user is thus able to seamlessly follow the progress of the expedition by clicking a hyperlink.
The robustness of the project team's technical choices also can be seen in the handling of unusual spellings. Lewis and especially Clark and the other enlisted men regularly used various spellings for the same word. A good example of this is the name of Corps member George Drouillard, which was spelled Drewyer, Drewyear, Drewer, Drurer, and Druier, among others. Using Notetab's clip capabilities, the project team added TEI encoding to the various spellings in order to regularize the name to Drouillard, George. This encoding does not replace the original spelling, but rather enriches it. The user therefore can find all references to George Drouillard using the online index or search capabilities, regardless of how the journalists spelled his name.
In order to make the Journals as accessible as possible, the project team decided to implement a search feature. To accomplish this we utilized "Tamino©" XML database technology. Using the Journal files encoded according to TEI standards in concert with XQuery (a query language drafted by the World Wide Web Consortium), the site now provides flexible text searching which takes full advantage of up-to-date XML technology.