Tito Orlandi THE "CORPUS DEI MANOSCRITTI COPTI LETTERARI" I The enterprise, which is now called Corpus dei Manoscritti Copti Letterari, started in 1968 as a conventional archive, mainly of photos of Coptic literary manuscripts. Its aim was the reconstruction of the Coptic codices coming from the library of the White Monastery in Upper Egypt, which were dispersed in libraries throughout the world, and the study and publication of their contents. The program was later expanded to include the whole field of Coptic literature, and in the process, unique archives of data and photographs were gathered. The work was concentrated upon the following items: 1. Photographic archive; 2. Catalogue of manuscript collections; 3. History of the manuscripts; 4. Catalogue of Coptic literary texts; 5. Reconstruction of the White Monastery Library; 6. Bibliography of the Coptic Literature, then general Coptic bibliography; 7. Publication of texts with introduction and translation. In 1980 a new project was started, to transfer the data of the archives into computer memories, in order that they may be manipulated by automatic process. We used a Data Base Management program ("OMNIDATA") working in the SPERRY 1100/80 mainframe of the Centro di Calcolo of the University of Rome, and several files of data were created: 1. Description of manuscripts; 2. Coptic Bibliography; 3. Inventory of the collections of manuscripts; 4. Catalogue of literary works in Coptic. After the experiences of the following years, with different programs both of Data Base and (later) manipulation of texts, the organization of the work was changed. Presently, the Corpus dei Manoscritti Copti Letterari is AN ENTERPRISE WHOSE AIM IS TO TREAT IN A COMPREHENSIVE BUT ALSO ARTICULATE AND FLESSIBLE WAY, AND - WHERE CONVENIENT - AUTOMATICALLY, THE REPRODUCTION OF THE COPTIC LITERARY MANUSCRIPTS AND ALL INFORMATION ON THEM AND ON SUBJECT RELATED TO THEM (scribes, authors, texts, production, readers, collections, scholars), AND TO DISSEMINATE THE SCIENTIFIC RESULTS THEREOF BY THE TECHNICAL MEANS WHICH IN TURN ARE THOUGHT TO BE THE MOST APPROPRIATE. The functions of the enterprise are organized around two main tasks: DESCRIPTION and REPRODUCTION, of manuscripts, works, etc. The description is obtained putting in various connections the information about Coptic manuscripts and literature, in order to form a consistent and possibly complete picture of the Coptic 1 literary world. For this the technique of the Data Base Management is employed. - The reproduction is obtained in "analogical" form through different photographic systems (mainly microfilm and microfiche); and in "encoded" ("digital") form suitable for various automated Text Processing possibilities. The problems posed by this approach have by now been partially answered, accounting for the fact that technical and theoretical progress keeps the matters in movement. We should especially mention the recent choice of UNIX (but especially of the UNIX "philosophy") as the privileged environment in which the computerized work is done. This has brought a series of invaluable improvements in all the steps of the organization. The problems which CMCL tries to cope with are: PROCEDURAL ARRANGEMENTS: - Identification of the different archives, through the definition of uniform characteristics of the objects taken into consideration; - Organization of the information to be put in the records forming the archives; - Relations between the archives and cross references. COMPUTER-RELATED PROPERTIES: - Portability of the files, that may be processed in different machines and by different programs; - Central updating of the files with simultaneous correction of data wherever necessary; - Visualization of the files convenient for the different steps in the management activity (screen or paper or microfiche; Coptic or Latin characters; etc.). II Having all this in mind, the work in the Corpus dei Manoscritti Copti Letterari is now carried on using nearly all kind of machines available, from portable computers to main frame (with their peripherals), and five basic kinds of programs: editor, word processor, text formatter, text analyzer (mainly concordance producer) - in the different steps of information and texts processing. On the other hand, all information (texts, bibliography, archival data, etc.) is stored only in the form of pure ASCII files, without any form of interspersed codes eventually produced by the manipulation of certain packages (notably word-processors and data-managers) and required by them, but inintelligible by others. The existing packages are used insofar they do not require nor 2 insert such codes, except for certain particular (generally final) purposes, or in particular moments of the process, after which the texts are again made free of non-ASCII codes. Therefore the products of the Corpus will be available to scholars, not only in the more or less conventional ways as printed texts or microform, but also in files suitable for management with most of the machines and software which they normally use. FILES OF TEXTS The files containing Coptic texts do not "reproduce" the physical shape of the text in any given manuscript, but are seen as a "kilometric" text (using the ASCII charecter Decimal 12 for practical reasons in the visualization) in which the ASCII character will be adapted to ENCODE ALL PHENOMENA found in the manuscript in question, which are RELEVANT TO THE PRESERVATION OF THE TEXT in a magnetic memory. The CODIFICATION ratio, i.e. the correspondence between Coptic and other necessary special characters from one side, and the "numbers" (sequences of bits) stored in the memory of the computer, from the other, will be INDEPENDENT "per se" from those actually used in the keyboards and in the printing devices. This has been done, because the keyboards and printing devices generally in use do not share exactly the same systems. It is true, however, that the systems in use are rather similar. The CODIFICATION system has been studied in order to facilitate the INPUT of the texts by the scholars through the keyboards normally in use today. It is understood that the encoder of one manuscript, or part of it, shall not change in any way the Coptic text, not even separate words. He will only read what is surely readable, encoding the Coptic text as it appears in the manuscript in the present state of conservation, and encoding other relevant information according to the chosen rules. The phenomena selected to be encoded are: Coptic text: each Coptic letter will correspond to one ASCII character which in the "normal" keyboards and printing device corresponds to one alfanumeric or special character. Inside the text, the following other relevant information will appear: End of line; End of column; Beginning of page with eventual numeration; Punctuation (in several forms); Majuscule in the margin; Physical lacune; Illegible letters; Separator (a special Coptic orthographic feature); raised dot; apostrophe; Blank; Marginal glosses. After the Coptic text has been coded, it is submitted to different procedures, according to different steps and goals of study and publication. Some of the programs used in the procedures have been especially written inside the CMCL (in 3 BASIC); the above mentioned packages are also used, when possible and convenient; some passages require an intervention by the scholar, of course through an "editor" program. First, the text can be automatically printed with Coptic characters in the shape of a "diplomatic" transcription. If another style of publication is (also) envisaged, the text is automatically divided into numbered paragraphs, according to the original punctuation of the manuscript (one or more punctuation signs may be selected). In this shape, it is passed through a concordance program. The result is used to check the transcription (e.g., the unusual spellings are highlighted in a concordance, and this is very helpful), and to normalize the orthography, if such are the criteria of the edition. At this stage the editor makes a first attempt to fill the lacunae, and to improve the texts of the manuscript, when there are manifest mistakes. After that, he prepares the translation using a program which checks each word in a dictionary (a sort of self-augmenting file), presents to the editor a choice of translations, and registers on another, "vertical", file the Coptic words and the chose Italian equivalent. This will lead to other modifications of the text itself, of the division of the paragraphs, etc., which are done by the editor on a copy of the original encoded file. This copy represent the correct form of the text as the editor sees it, and is used to produce: the "final" concordance, the formatted text for print, the "final" translation; unless there are other manuscripts to collate. In this case, every manuscript is treated first individually, and then their texts are collated. FILES OF DATA The files of data are conceived as a list of material, put in the memory in such way, that it can be easily transformed in whatever kind of "active" Data Base may be the choice of different scholars (Catalogues, Bibliographies, Description of manuscripts, etc.) Therefore they are put in memory in the form of a common, simple text file, divided only in "lines" (= portion of text delimited by one so-called CR = "Carriage Return"). But the file is organized in order to be read easily by any normal "Data Base Program", whether existing in the market or personally conceived. In the first time, the data were stored according the principles of the hierarchical data bases; now they are being modified according to the relational theory. The data are stored in Records divided into a number of Fields. The size of each Field is not fixed. There are codes (always in plain ASCII) indicating the separation of the different records; and other codes indicating the separation of the different fields in one record. 4 Consequently, all we need to define, in order to obtain a FILE useful for automatic manipulation, is: 1. Content of the file. 2. Number and order of the fields 3. Markers indicating the separation of the records 4. Markers indicating the separation of the fields. As for the CONTENT, there are 5 types of FILES, namely for: CODICES, ENTIRE OR RECONSTRUCTED. COLLECTIONS OF MANUSCRIPTS. CLAVIS COPTICA. CODICOLOGICAL AND PALAEOGRAPHICAL DESCRIPTION. BIBLIOGRAPHY. CODICES: Each RECORD represents one codex, which is preserved more or less complete, or can be reconstructed from a sufficient number of scattered leaves. FIELD 1. Conventional call-number. FIELD 2. Call number given by the owner, or (for the reconstructed codices) the key-word: FRAGMENTS, which refers to the Field 8. FIELD 3. Dialect. FIELD 4. Provenience. FIELD 4. Editions. FIELD 5. Available reproductions. FIELD 6. Other bibliography. FIELD 7. List of the content. FIELD 8. List of the fragments. COLLECTIONS: Each RECORD represents one manuscript, kept in the Collection which gives the name to the single file. FIELD 1. Call number given by the owner. FIELD 2. Catalogue number. FIELD3.Conventional call-number of the reconstructed codex (if any). FIELD 4. Dialect. FIELD 5. Content. FIELD 6. Provenience. FIELD 7. Editions. FIELD 8. Bibliography. FIELD 9. Previous owners. FIELD 10. Complementary fragments. CLAVIS COPTICA: Each RECORD will represent one work of the Coptic Literature. FIELD 1. Number of "access" in the list. FIELD 2. Number of the Clavis Patrum Graecorum. FIELD3.Number of the Bibliotheca Hagiographica Graeca. FIELD4.Author or Literary genre (in case of obviously anonymous works: Passio; Acta Conc.; Canones; etc.) FIELD 5. Title. FIELD 6. Manuscripts. FIELD 7. Bibliography. 5 FIELD 8. Abstract. CODICOLOGICAND AND PALAEOGRAPICAL DESCRIPTION: A great number of fields has been conceived, to contain detailed information on all the characteristic of the manuscripts. BIBLIOGRAPHY: It consists of 4 interrelated files. Each listed publication has an identification number which is the same in all files. Each file concerns one aspect of the publications. FILE 1: Description of the publication. FIELD 1. Author. FIELD 2. Title FIELD 3. Periodical. FIELD 4. Miscellany. FIELD 5. Editor. FIELD 6. Collection FILE 2: Subject. Only one Field, but there may be many records for each publication. FILE 3: Manuscripts published (id.). FILE 4: Reviews (id.). III The CMCL has four series of publications: The COPTIC BIBLIOGRAPHY, which is published every year as a brochure with a set of microfiche; A series of printed books of editions of texts, translations and studies; A series of preliminary editions of single manuscripts, as a brochure with a set of microfiche reproducing the manuscript; A series of Catalogues of the collections of Coptic manuscripts. In conclusion, the work done by the Corpus dei Manoscritti Copti Letterari should produce three advantages in the field of Coptic Studies: 1. To increase the amount of information available about Coptic literature, and accelerate the publication of texts and of tools for scholars. 2. To facilitate the subdivision of the work to be done on each particular text (linguistic, philological, historical, theological analysis), because each specialist may exert his particular competence on one part of a set of uniformly organized materials. 3. To identify the sectors of Coptology where most urgent is the contribution of scholars. 6