3.1 Intermediate Format

Annotating individual language versions of the 66 books of the Bible (or, in some cases, the New Testament) requires only a simple 3-level hierarchy of text elements (book, chapter, verse). In our initial pass through the annotation process (see below), we are labeling elements as b (book), c (chapter), and v (verse), producing an intermediate representation that captures the major structural levels without conforming to any particular DTD. The following examples show a single verse, Matthew 1:7, in 9 languages:gif

ENGLISH: <v id="MAT:1:7">And Solomon begat Roboam; and Roboam begat Abia; and Abia begat Asa;</v>
FRENCH: <v id="MAT:1:7">Salomon engendra Roboam; Roboam engendra Abia; Abia engendra Asa;</v>
DANISH: <v id="MAT:1:7">og Salomon avlede Roboam; og Roboam avlede Abia; og Abia avlede Asa;</v>

FINNISH: <v id="MAT:1:7">Salomolle syntyi Rehabeam, Rehabeamille syntyi Abia, Abialle syntyi Aasa; </v>
GREEK: <v id="MAT:1:7">solomwn de egennhsen ton roboam roboam de egennhsen ton abia abia de egennhsen ton asa</v>
LATIN: <v id="MAT:1:7">Salomon autem genuit Roboam Roboam autem genuit Abiam Abia autem genuit Asa </v>
SWEDISH: <v id="MAT:01:7">Salomo födde Roboam, Roboam födde Abia. Abia födde Asaf;</v>
SPANISH: <v id="MAT:1:7">Salomón Engendró a Roboam; Roboam Engendró a Abías; Abás Engendró a Asa;</v>
VIETNAMESE: <v id="MAT:1:7">Salomôn sinh Roboam, Roboam sinh Abya, Abya sinh Asa, </v>

In all these cases, the intermediate encoding for book and chapter elements are identical:

  <b id="MAT">
  <c id="MAT:1">
The labels (id attributes) for elements make it possible to identify verses in a context-independent way by including the book and chapter in the label, e.g. ``GEN:1:1'' for Genesis, chapter 1, verse 1. This will allow users to take advantage of simple tools such as Unix 'grep' for simple day-to-day manipulation (for example, needing to look up a particular verse) while also being able to utilize more powerful SGML-based tools.

Philip Resnik
Tue Oct 21 19:23:13 EDT 1997