Task of Modern Language Archives
The introduction of digital technology changed our ways to produce, store and use language resources fundamentally. Traditionally, textual linguistic descriptions actually formed the primary data for linguistic analysis. Now we have the situation that digital audio or video recordings can be available at all steps of the scientific analysis process. This revolutionary change basically took place in less then two decades. As a result, it is now comparatively much more easy and cheaper to make recordings and store them on computers. So we are faced with a gigantic increase of primary potentially valuable digital resources. The unprecedented growth of compute power and storage capacity creates the illusion for all participants in this process that an unlimited number of resources can be handled without additional efforts.
However, experiences such as at the MPI for Psycholinguistics and in the DOBES programme indicate that the individual researcher is increasingly dependent on sophisticated archive services. These services cover the process of storing the language resources as well as offering access to them. At the MPI and within the DOBES programme every year about 60 teams are carrying out field research leading to more than 1000 hours of recordings that will all be digitized and then partly be annotated. In addition to this, large sign language and bilingualism projects, for example, created a similar amount of recordings within not more than two years. These numbers indicate the enormous amount of resources modern digital archives in the language resource area need to manage. The individual researcher cannot maintain the increasing amount of resources at the required level as was pointed out by D. Schüller in the realm of an UNESCO investigation.
Faced with this situation a new definition of the tasks of digital language resource archives was needed. Traditional archives were and still are focusing on storing physical objects so that they will survive for as many years as possible. When the archives turned over to also storing digital objects they already fundamentally changed their view with respect to what constitutes the object to be archived. For example, IASA concluded that it is not the primary task anymore to preserve the physical object (i.e., the tape), but to focus on the content. Still the way of lending access to the material is still determined largely by traditional priorities that will not be sufficient for researchers. It was the Digital Library community that focused on the access aspect and that came up with many useful suggestions.
Modern language resource archives need to find a combination of the traditional ideas of preserving the content and modern ways of providing access to them, between the stability of archived material and the possibility to enrich it. They are centers that have to offer relevant services to make them attractive partners for the researchers and that will convince researchers to not only deposit their material, but also to rely on them for using and sharing it. They are also centers that have to be provide services for the language communities, since these have an interest to maintain their languages and educate for example young children. The primary objective remains data survival. The secondary objective is to make its content attractive and accessible not only for the interested researchers, but also for the members of the language community, journalists, students and the general public. But, this may only be achieved when the agreements about proper organization, encoding and structuring schemes based on open, neutral and widely accepted standards will not be hurt.
Since such digital archives contain many recordings and material touching person’s privacy, about religious ceremonies and other sensitive events, legal regulations and agreements about ethic behavior have to be taken very serious. Appropriate measures to protect sensitive data have to be taken and documented.
Task of the AAB
It is obvious that modern technology changes fast in all respects ranging from metadata proposals to organize and manage resources to storage technology that stores the bit-streams. It is also obvious that there are tensions between the requirements of accessibility and long-term persistence. While the first is directed to user convenience, immediate presentation solutions, flexibility and data enrichment, the second has to take care that archived data has to be stored in neutral and open data formats, that it is not modified and consistent and that it is migrated to new technologies without loosing information. Modern digital archives will be influenced by both aspects and also technologists on the archivist side may have views that could harm the persistence of the archived data. A digital archive therefore has to be very careful with its strategies to meet the two goals and they should be evaluated at regular instances by a board of competent experts.
Therefore, the MPI archiving team and the representatives of the Volkswagen Foundation have decided to establish an Archive Advisory Board (AAB) with archive and technology experts that can guide the future development of the DOBES and the MPI archive. The AAB should in particular take care that
- the archiving principles represent the technological state-of-the-art
- appropriate encoding and format standards have been chosen
- all decisions of the archivist about major changes are made explicit
- the needs of long-term preservation and short-term access are balanced
- possible extensions will not influence the stability and quality of the archive
- appropriate data protection measures are taken
While the AAB will take care of technological and archiving aspects, the already existing Linguistic Advisory Board (LAB) will take care of all questions that are of linguistic relevance. The AAB will meet every two years. The responsible persons of the DOBES archive will report about their activities and their plans. The members of the archive board will discuss these reports and give advice with respect to further activities.
The current members of the Archive Advisory Board are:
- Bernard Comrie, MPI for Evolutionary Anthropology, Leipzig
- Peter Doorn, DANS, Den Haag
- Jost Gippert, University of Frankfurt, DOBES Programme Member
- Bernhard Neumair, GWDG, Göttingen
- Laurent Romary, CNRS, Paris/Nancy
- Dietrich Schüller, Phonogramm Archiv, Vienna
- Harald Suckfüll, General Administration MPG, Munich
Stephen Levinson and Peter Wittenburg will represent the MPI for Psycholinguistics, Nijmegen.
Vera Szöllösi will represent the Volkswagenfoundation and participate as guest in the AAB meetings.