Tito Orlandi


                  THE "CORPUS DEI MANOSCRITTI COPTI LETTERARI"



                                        I

        The enterprise,  which is now called Corpus dei Manoscritti Copti
        Letterari,  started in 1968 as a conventional archive,  mainly of
        photos   of  Coptic  literary  manuscripts.   Its  aim  was   the
        reconstruction  of the Coptic codices coming from the library  of
        the  White  Monastery in Upper Egypt,  which  were  dispersed  in
        libraries throughout the world,  and the study and publication of
        their  contents.

        The  program  was  later expanded to include the whole  field  of
        Coptic literature,  and in the process,  unique archives of  data
        and photographs were gathered. The work was concentrated upon the
        following  items:   1. Photographic  archive;   2. Catalogue   of
        manuscript   collections;    3. History   of   the   manuscripts;
        4. Catalogue of Coptic literary texts;  5.  Reconstruction of the
        White   Monastery   Library;   6. Bibliography  of   the   Coptic
        Literature,  then general Coptic bibliography;  7. Publication of
        texts with introduction and translation.

        In  1980 a new project was started,  to transfer the data of  the
        archives  into  computer  memories,  in order that  they  may  be
        manipulated by automatic process.  We used a Data Base Management
        program  ("OMNIDATA") working in the SPERRY 1100/80 mainframe  of
        the  Centro  di Calcolo of the University of  Rome,  and  several
        files  of  data  were created:  1.  Description  of  manuscripts;
        2. Coptic  Bibliography;   3. Inventory  of  the  collections  of
        manuscripts; 4. Catalogue of literary works in Coptic.

        After  the  experiences of the following  years,  with  different
        programs both of Data Base and (later) manipulation of texts, the
        organization of the work was changed.  Presently,  the Corpus dei
        Manoscritti  Copti Letterari is AN ENTERPRISE WHOSE  AIM  IS   TO
        TREAT  IN A COMPREHENSIVE BUT ALSO ARTICULATE AND FLESSIBLE  WAY,
        AND  - WHERE CONVENIENT - AUTOMATICALLY,  THE REPRODUCTION OF THE
        COPTIC  LITERARY MANUSCRIPTS AND ALL INFORMATION ON THEM  AND  ON
        SUBJECT  RELATED TO THEM (scribes,  authors,  texts,  production,
        readers,   collections,   scholars),   AND  TO  DISSEMINATE   THE
        SCIENTIFIC  RESULTS THEREOF BY THE TECHNICAL MEANS WHICH IN  TURN
        ARE THOUGHT TO BE THE MOST APPROPRIATE.

        The  functions  of the enterprise are organized around  two  main
        tasks:  DESCRIPTION and REPRODUCTION, of manuscripts, works, etc.
        The  description is obtained putting in various  connections  the
        information about Coptic manuscripts and literature,  in order to
        form  a  consistent and possibly complete picture of  the  Coptic


                                        1








        literary  world.   For  this  the  technique  of  the  Data  Base
        Management  is  employed.   - The  reproduction  is  obtained  in
        "analogical"  form through different photographic systems (mainly
        microfilm  and  microfiche);  and in "encoded"  ("digital")  form
        suitable for various automated Text Processing possibilities.

        The  problems posed by this approach have by now  been  partially
        answered,  accounting for the fact that technical and theoretical
        progress  keeps  the matters in movement.  We  should  especially
        mention  the  recent choice of UNIX (but especially of  the  UNIX
        "philosophy")   as  the  privileged  environment  in  which   the
        computerized  work  is  done.   This  has  brought  a  series  of
        invaluable improvements in all the steps of the organization.

        The problems which CMCL tries to cope with are:

        PROCEDURAL ARRANGEMENTS:

        - Identification   of  the  different   archives,   through   the
        definition  of uniform characteristics of the objects taken  into
        consideration;
        - Organization  of  the  information  to be put  in  the  records
        forming the archives;
        - Relations between the archives and cross references.

        COMPUTER-RELATED PROPERTIES:

        - Portability  of the files,  that may be processed in  different
        machines and by different programs;
        - Central  updating of the files with simultaneous correction  of
        data wherever necessary;
        - Visualization  of  the  files  convenient  for  the   different
        steps  in the management activity (screen or paper or microfiche;
        Coptic or Latin characters; etc.).



                                       II

        Having all this in mind,  the work in the Corpus dei  Manoscritti
        Copti  Letterari  is  now  carried on using nearly  all  kind  of
        machines available,  from portable computers to main frame  (with
        their  peripherals),  and five basic kinds of  programs:  editor,
        word processor, text formatter, text analyzer (mainly concordance
        producer)  - in  the  different steps of  information  and  texts
        processing.

        On the other hand, all information (texts, bibliography, archival
        data,  etc.)  is  stored  only in the form of pure  ASCII  files,
        without any form of interspersed codes eventually produced by the
        manipulation  of  certain packages (notably  word-processors  and
        data-managers)  and  required  by  them,  but  inintelligible  by
        others.

        The  existing  packages are used insofar they do not require  nor


                                        2








        insert  such  codes,  except for  certain  particular  (generally
        final) purposes,  or in particular moments of the process,  after
        which the texts are again made free of non-ASCII codes.

        Therefore  the  products  of  the Corpus  will  be  available  to
        scholars,  not  only  in  the more or less conventional  ways  as
        printed  texts  or  microform,  but also in  files  suitable  for
        management  with  most of the machines and  software  which  they
        normally use.


        FILES OF TEXTS

        The files containing Coptic texts do not "reproduce" the physical
        shape  of  the text in any given manuscript,  but are seen  as  a
        "kilometric"  text  (using  the ASCII charecter  Decimal  12  for
        practical  reasons  in  the visualization)  in  which  the  ASCII
        character  will  be adapted to ENCODE ALL PHENOMENA found in  the
        manuscript in question, which are RELEVANT TO THE PRESERVATION OF
        THE TEXT in a magnetic memory.

        The CODIFICATION ratio,  i.e.  the correspondence between  Coptic
        and  other  necessary special characters from one side,  and  the
        "numbers"  (sequences  of  bits)  stored in  the  memory  of  the
        computer, from the other, will be INDEPENDENT "per se" from those
        actually used in the keyboards and in the printing devices.  This
        has  been  done,  because  the  keyboards  and  printing  devices
        generally  in  use do not share exactly the same systems.  It  is
        true,  however,  that  the  systems in use  are  rather  similar.

        The  CODIFICATION system has been studied in order to  facilitate
        the  INPUT  of  the texts by the scholars through  the  keyboards
        normally in use today.  It is understood that the encoder of  one
        manuscript, or part of it, shall not change in any way the Coptic
        text,  not even separate words.  He will only read what is surely
        readable,   encoding  the  Coptic  text  as  it  appears  in  the
        manuscript  in  the present state of conservation,  and  encoding
        other relevant information according to the chosen rules.

        The phenomena selected to be encoded are:

        Coptic  text:  each  Coptic letter will correspond to  one  ASCII
        character  which  in the "normal" keyboards and  printing  device
        corresponds  to one alfanumeric or special character.  Inside the
        text,  the following other relevant information will appear:  End
        of  line;   End  of  column;  Beginning  of  page  with  eventual
        numeration;  Punctuation  (in several forms);  Majuscule  in  the
        margin;  Physical lacune; Illegible letters; Separator (a special
        Coptic  orthographic feature);  raised  dot;  apostrophe;  Blank;
        Marginal glosses.

        After  the  Coptic  text  has been  coded,  it  is  submitted  to
        different  procedures,  according to different steps and goals of
        study  and  publication.   Some  of  the  programs  used  in  the
        procedures  have  been  especially written inside  the  CMCL  (in


                                        3








        BASIC); the above mentioned packages are also used, when possible
        and  convenient;  some  passages require an intervention  by  the
        scholar, of course through an "editor" program.

        First,   the  text  can  be  automatically  printed  with  Coptic
        characters  in  the shape of  a  "diplomatic"  transcription.  If
        another  style  of publication is (also) envisaged,  the text  is
        automatically divided into numbered paragraphs,  according to the
        original  punctuation of the manuscript (one or more  punctuation
        signs  may be selected).  In this shape,  it is passed through  a
        concordance   program.   The   result  is  used  to   check   the
        transcription (e.g.,  the unusual spellings are highlighted in  a
        concordance,  and  this  is very helpful),  and to normalize  the
        orthography, if such are the criteria of the edition.

        At  this  stage  the editor makes a first  attempt  to  fill  the
        lacunae,  and to improve the texts of the manuscript,  when there
        are  manifest mistakes.  After that,  he prepares the translation
        using a program which checks each word in a dictionary (a sort of
        self-augmenting  file),   presents  to  the  editor  a  choice  of
        translations,  and  registers on another,  "vertical",  file  the
        Coptic words and the chose Italian equivalent.  This will lead to
        other  modifications of the text itself,  of the division of  the
        paragraphs,  etc.,  which are done by the editor on a copy of the
        original encoded file.

        This  copy  represent the correct form of the text as the  editor
        sees it,  and is used to produce:  the "final"  concordance,  the
        formatted text for print,  the "final" translation;  unless there
        are other manuscripts to collate.  In this case, every manuscript
        is treated first individually, and then their texts are collated.



        FILES OF DATA

        The files of data are conceived as a list of material, put in the
        memory in such way, that it can be easily transformed in whatever
        kind  of  "active"  Data  Base may be  the  choice  of  different
        scholars (Catalogues, Bibliographies, Description of manuscripts,
        etc.)  Therefore they are put in memory in the form of a  common,
        simple  text  file,  divided only in "lines" (= portion  of  text
        delimited by one so-called CR = "Carriage Return").  But the file
        is  organized in order to be read easily by any normal "Data Base
        Program", whether existing in the market or personally conceived.
        In the first time,  the data were stored according the principles
        of  the  hierarchical  data bases;  now they are  being  modified
        according to the relational theory.

        The  data are stored in Records divided into a number of  Fields.
        The size of each Field is not fixed.  There are codes (always  in
        plain  ASCII) indicating the separation of the different records;
        and other codes indicating the separation of the different fields
        in one record.



                                        4








        Consequently,  all we need to define,  in order to obtain a  FILE
        useful for automatic manipulation, is:

        1. Content of the file.
        2. Number and order of the fields
        3. Markers indicating the separation of the records
        4. Markers indicating the separation of the fields.

        As for the CONTENT, there are 5 types of FILES, namely for:

                       CODICES, ENTIRE OR RECONSTRUCTED.
                       COLLECTIONS  OF  MANUSCRIPTS.
                       CLAVIS COPTICA.
                       CODICOLOGICAL AND PALAEOGRAPHICAL DESCRIPTION.
                       BIBLIOGRAPHY.

        CODICES:  Each  RECORD represents one codex,  which is  preserved
        more or less complete,  or can be reconstructed from a sufficient
        number of scattered leaves.
                 FIELD 1. Conventional call-number.
                 FIELD  2.  Call number given by the owner,  or (for  the
                  reconstructed codices) the key-word:  FRAGMENTS,  which
                  refers to the Field 8.
                 FIELD 3. Dialect.
                 FIELD 4. Provenience.
                 FIELD 4. Editions.
                 FIELD 5. Available reproductions.
                 FIELD 6. Other bibliography.
                 FIELD 7. List of the content.
                 FIELD 8. List of the fragments.

        COLLECTIONS:  Each RECORD represents one manuscript,  kept in the
        Collection which gives the name to the single file.
                 FIELD 1. Call number given by the owner.
                 FIELD 2. Catalogue number.
                 FIELD3.Conventional  call-number of the  reconstructed
                  codex (if any).
                 FIELD 4. Dialect.
                 FIELD 5. Content.
                 FIELD 6. Provenience.
                 FIELD 7. Editions.
                 FIELD 8. Bibliography.
                 FIELD 9. Previous owners.
                 FIELD 10. Complementary fragments.

        CLAVIS  COPTICA:  Each  RECORD   will represent one work  of  the
        Coptic Literature.
                 FIELD 1. Number of "access" in the list.
                 FIELD 2. Number of the Clavis Patrum Graecorum.
                 FIELD3.Number of the Bibliotheca Hagiographica Graeca.
                 FIELD4.Author or Literary genre (in case of  obviously
                  anonymous works: Passio; Acta Conc.; Canones; etc.)
                 FIELD 5. Title.
                 FIELD 6. Manuscripts.
                 FIELD 7. Bibliography.


                                        5








                 FIELD 8. Abstract.

        CODICOLOGICAND AND PALAEOGRAPICAL DESCRIPTION:  A great number of
        fields has been conceived, to contain detailed information on all
        the characteristic of the manuscripts.

        BIBLIOGRAPHY:  It consists of 4 interrelated files.  Each  listed
        publication has an identification number which is the same in all
        files. Each file concerns one aspect of the publications.
             FILE 1: Description of the publication.
                 FIELD 1. Author.
                 FIELD 2. Title
                 FIELD 3. Periodical.
                 FIELD 4. Miscellany.
                 FIELD 5. Editor.
                 FIELD 6. Collection
             FILE  2:  Subject.  Only  one Field,  but there may be  many
        records for each publication.
             FILE 3: Manuscripts published (id.).
             FILE 4: Reviews (id.).



                                       III

        The CMCL has four series of publications:

        The  COPTIC  BIBLIOGRAPHY,  which is published every  year  as  a
        brochure with a set of microfiche;

        A series of printed books of editions of texts,  translations and
        studies;

        A  series  of preliminary editions of single  manuscripts,  as  a
        brochure with a set of microfiche reproducing the manuscript;

        A series of Catalogues of the collections of Coptic manuscripts.



        In conclusion,  the work done by the Corpus dei Manoscritti Copti
        Letterari  should produce three advantages in the field of Coptic
        Studies: 1. To increase the amount of information available about
        Coptic literature, and accelerate the publication of texts and of
        tools for scholars.  2. To facilitate the subdivision of the work
        to  be done on each particular  text  (linguistic,  philological,
        historical,  theological  analysis),  because each specialist may
        exert his particular competence on one part of a set of uniformly
        organized  materials.  3. To  identify the sectors  of  Coptology
        where most urgent is the contribution of scholars.







                                        6