Received: by UICVM (Mailer R2.03B) id 6993; Mon, 02 Oct 89 04:45:56 CDT Date: Mon, 2 Oct 89 09:51:00 LCL Reply-To: Text Encoding Initiative - Text Documentation Committee list , "on GEC 4190 Rim-B at UCL Wujastyk" Sender: Text Encoding Initiative - Text Documentation Committee list From: "on GEC 4190 Rim-B at UCL Wujastyk" Subject: Sample document header To: Michael Sperberg-McQueen My colleague Peter Schreiner recently composed a header to go on his electronic text of the Brahmapurana, a long Sanskrit text on myth, which he is depositing in the Oxford Archive. Here is the header, for general perusal: [The text transliterated is taken from the following two (printed) editions of the Brahmapur-a.na: Mah-amuni--'sr-imad--vy-asa--pra.n-ita.m Brahmapur-a.nam, ed. Hari N-ar- aya.na -Ap.te, (abbreviated ASS), -Anand-a'srama--sa.msk.rta--granth- avali.h No. 28, 1895. Brahmapur-a.nam, ed. Ra;ngan-atha S-uri; 2 parts. Bombay, Ve;nka.te'svara Press 1906 (abbreviated VePr). 1.1. Transliteration Diacritical marks: Punctuation marks are used to code diacritics. All diacritics are typed in front of the letter to which they belong (which imitates the traditional "layout" of typewriters where accents etc. are placed on dead keys and need to be typed before the character is typed). . = subscript dot (e..g. k.rta.h) ; = superscript dot (e..g. a;nga) ? = tilda (superscript) (e..g. praj?n-a) - = superscript hyphen (macron) (e..g. -atm-a) ' = aigu (superscript) (e..g. 's-astra) (Where, as in this introductory document or in comments to variant readings within the text, the use of punctuation marks in their proper function has to alternate with their function as diacritical marks, their use as punctuation marks is distinguished by a following blank (or other signs of punctuation including parentheses) or by doubling where a blank is not possible (e..g. in abbreviations). Thus, the dots in ".r.s.i" are diacritics, but the dot after ".r.si. " is a full stop. Similarly, since no blank can be inserted after a hyphen, the actual hyphen is written by doubling it ("--"). Exclamation mark is used for single quotation mark (!quote!) and apostrophe ("author!s"). In text format it is also used as avagraha (e..g. so !ham, which would be "so¤ ¤aham" in input format).) The quarter of a verse (p-ada) is marked by a | (vertical bar). This bar (da.n.da) is not followed by a blank before verse quarters 2 and 4 (b and d) in anu.s.tubh metre ('sloka). After quarter 2 and 4 a new line begins. Longer metres (longer than the 'sloka, that is) are typed in such a way that each p-ada gets a different line. Lacunae are indicated by using the letter x one time per missing ak.sara; the x--es are put in pointed parentheses (e..g. for a lacuna of three syllables). The full reference (part, chapter and verse) is given at the end of the verse to which it refers. (While transliterating the full reference needs to be typed only for the first verse of each chapter.) For ed. VePr, which is divided in two parts, the first segment of the reference includes the number for the book or part according to the formula: book times 1000 plus chapter [full stop] verse The beginning of references is marked by double bar and the end is marked by a single bar. Always after a reference a new line begins. 1.2. Sandhi The "principle of transliteration" has been that the input format should reproduce the letters of the printed text as closely as possible, i..e. that one types what one sees. However, to what is printed (in Devan-agar- i) markers are added (in the transliteration) to mark sandhi changes. A sandhi change is defined with regard to the "pausa form" of a word, i..e. the form a word would take at the end of a line or out of context (vigraha). Note that this pausa form need not be identical with the stem which would be entered in a dictionary. Thus, consonants which have undergone a sandhi change in the text are marked by ¤ (this sign is inserted during the processing of the input; * is what had been typed). Similarly, final vowels which have changed due to sandhi are marked by ¤ (e..g. -as-id¤ r-aj-a nalo¤ n-ama). In case of vowel sandhi the above--mentioned principle of transliteration suffers an exception: Vowel sandhi is dissolved and marked (e..g. na¤asti, ca¤eva). Similarly, avagraha is reconstituted, the originally omitted initial "a" being marked as sandhi vowel (e..g. devo¤ ¤api). In some special cases the marking of sandhi has to be extended to include some disambiguating information: -- to half--vowels which substitute for a long vowel the diacritic for "long vowel" (-) is added (e..g. devy-¤ api); -- if final --a in sandhi does not stand for --a.h (with visarga), then the original vowel which has been substituted by the --a is added (e..g. lokae¤ eva, where "loka eva" is printed, which is the sandhi form for "loke eva"). In case of "double sandhi" the sandhi is marked by double ¤¤ (or double ¤¤, e..g. sa¤¤eva in case of "saiva" instead of "sa eva"). Blank is inserted between words wherever this is possible in transliteration (but not necessarily in Devan-agar-i), e..g. "hy¤ api, nalo¤ ¤api. 1.3. Separation of compounds Separation of compounds is marked by inserting + between the members of a compound (e..g. brahma+pur-a.na). In case of sandhi, the + functions also as sa.mdhi--marker, i..e. no additional sandhi--marker is added (e..g. tapo+vane, mah-a+-atmana.h). Separation of compounds is restricted to nominal compounds (including upap-ada--compounds like ura+ga, go+p-i) and does not include grammatical analysis. For details, special cases etc. see the introduction to Sanskrit Indices and Text of the Brahmapur-a.na, Wiesbaden 1987, p. xvi--xvii, by P. Schreiner and R. Shnen. 1.4. Variant readings The beginning of the passage for which a variant exists is marked by opening parenthesis. In deciding about the extension of the text thus marked, the changes generated for the text format had to be taken into consideration. This meant that occasionally words which are identical in the base text and in the variant are included in the parentheses. The beginning of the variant is marked by a siglum, i..e. by a single capital letter. Sigla are separated by a comma (no blank). There is no blank between the siglum and the variant. If there are several variants for the same passage of the base text, they are listed sequentially. The variant (or the last variant if there is more than one) is closed by the closing parenthesis. The blank before the next word is considered to belong to the variant and is put inside the parentheses. The continuation of the base text follows without intermediate blank. Schematic pattern: (... A... )... (... A,B... )... (... A... B... )... 1.5. Interpolations Interpolations are treated as "variants without base text", i..e. siglum follows immediately upon the opening parenthesis. The siglum is repeated before the closing parenthesis which marks the end of the interpolation. This allows for the input of variants within interpolations which are attested in more than one source. Long interpolations may be entered as a sequence of separate interpolations (e..g. verse by verse). 1.6. Omissions Passages from the base text which are omitted in any of the variant texts are marked by double parentheses plus siglum enclosing the omitted passage (which may also be individual words). Schematic patterns: ((S... S)) ... ((S... S))... 1.7. Editorial additions Annotations, remarks etc. by the editor of the transliteration are enclosed in square brackets (like this introduction which precedes the actual transliteration of the text). Annotations by the editor(s) of the edition which served as source of the transliteration (e..g. conjectures, markers for lacunae etc.) which are part of the printed edition are enclosed in pointed parentheses. 1.8. Colophones Colophones which are part of the printed edition are enclosed by triple square brackets. 2. Processing The input and processing of the transliterated text has been done with TUSTEP, the Tuebingen System of Text--Processing Programs. The TUSTEP format includes a reference number in front of every record; this machine reference has been calculated in such a way that it agrees with the textual reference. In the ASCII--format of the input file this machine reference is lost. Some of the tools for textual analysis which were produced from the input format have been published: Peter Schreiner, Renate Soehnen: Sanskrit Indices and Text of the Brahmapur-a.na. Wiesbaden: Otto Harrassowitz, 1987. The following list gives a survey of programs (German names in parentheses) developed for the processing of our input: Any of the transliterated versions can be extracted. We used ed. ASS as our basic text ("Grundtext"); variant versions are ed. VePr and any of the manuscripts used for the ASS editions. (GRUNDTEXTKOP, VARTEXTKOP) The machine references in TUSTEP are calculated from the references in the text (REFRECHNEN). The text format (i..e. the conventionally transliterated text without markers; with compounds and sandhis reconstituted) can be generated (TEXTFORM). This version can be processed for output even in Devan-agar-i with programs which work on the basis of transliterated input (e..g. TeX). The pausa format of the text is generated by changing all the characters marked by ¤ or + according to the sandhi rules of Sanskrit grammar. Each word appears in the phonetic form which it would assume at the end of a line (e..g. -adibhir¤, -adibhi.s¤, -adibhi's¤, -adhibhis¤ all become -adibhi.h). Members of compounds are separated. (PAUSAFORM) Indexes: -- KWIC--index (from modified input format) -- P-ada--index (from modified text format) -- wordforms (from pausa format) -- reverse index of wordforms (from pausa format) All indexes are sorted according to the Devan-agar-i alphabet and may include frequencies (absolute and relative) and formatting commands for the output. Those interested in any version or output other than the transliterated input format with variants should contact any of the authors: Renate Shnen, Department of Indology, School of Oriental and African Studies, Thornhaugh Street, Russell Square, London WC1H 0XG, U..K. Peter Schreiner, Indologisches Seminar, Universitt Zrich, Florhofgasse 11, CH--8001 Zrich, Switzerland ]