Archives

CLARIN: European and National Responsibilities

CLARIN: European and national responsibilities

Presentation (revised version) by Koenraad De Smedt
CLARIN, Solstrand, Dec. 15, 2008

CLARIN goals

CLARIN is a project for creating e-Infrastructure, with the intention of supporting e-Science in the Humanities in all of Europe. In 2006 the project was selected in the first ESFRI roadmap for research infrastructure. The project was thereby recognized as a crucial pillar to strengthen the European Research Area, in particular for capacity building.

The roadmap identified important challenges that the humanities are facing. The complexity of the record of human languages — a record that is multilingual, historically specific, geographically dispersed, and often highly ambiguous in meaning — makes digitization difficult and expensive.

The task for CLARIN is not to generate new language resources, but to better utilize the vast amount of data and information that already exist or should be generated in Europe. Today the social sciences and humanities are hampered by the fragmentation of the scientific information space. Data, information and knowledge are scattered in space and divided by language, cultural, economic, legal, and institutional barriers.

Every other week, one of the remaining 6500 living languages dies. But also a surprising amount of language materials is endangered. Irreplaceable recordings and notes on the shelves of individual researchers are inaccessible and the majority of these materials will quickly become lost or unusable unless they are taken care of in a digital infrastructure.

Investments are necessary, consisting of digitization, migration to new standards and platforms, proper documentation and annotation. User groups must get easier access across institutions and national boundaries.

While some doubt if we can afford an infrastructure, one should also ask oneself if we can afford not to build an infrastructure. The longer we wait, the more expensive it will get. So-called retrospective documentation of insufficiently documented materials is difficult, it is expensive and it is error-prone, because there may be limited access to the original researchers. Re-collection of lost and damaged data is also expensive and sometimes impossible. The re-use of properly managed existing materials for new purposes and their use by new scientific and commercial user groups far more economical than letting each researcher re-discover such materials. The value for society consists of an enormous trigger for further research and technological innovation.

Responsibilities in CLARIN’s preparatory phase

In its preparatory phase (2008-2010), CLARIN is making plans for establishing computer centers, implementing federation solutions, making agreements on user licenses and setting standards for coding and metadata. It is launching pilot projects in the humanities, it is tacking issues related to access rights, and it will put together a pan-European agreement in which European and national responsibilities are clearly outlined.

But already in the current phase, there are clear national responsibilities. The project workplan calls for at least the following national CLARIN activities:

  1. Participation in Working Groups. Throughout this phase it has to be ensured that whatever is proposed (standards, tools, technologies, services, licences, etc) will be adequate to serve all the language communities (large and small) and all potential user communities. To this end CLARIN has set up a number of working groups to gather information, to discuss standards, to adapt existing tools and resources to the CLARIN specifications, to conduct experiments with the prototype, etc. These tasks cannot be carried out by the consortium partners alone, for a number of reasons: the working capacity of the consortium is limited, whereas discussions about e.g. standards require broad participation and support. Working Groups are therefore open to participants from all CLARIN member sites, not just consortium members (at this moment 147 institutions in 32 countries). Due to the very limited budget, the national funding agencies are asked to provide financial support to participants in CLARIN working groups. The main justification is that this would serve to protect the interests of the national language(s) and the national humanities research communities.
  2. Demonstrators: Projects that demonstrate services and applications are an excellent instrument to show the potential of the infrastructure for future users. These demonstrators can play a crucial role in both the promotion of the infrastructure and the discovery of user needs. CLARIN will launch a call for demonstrator projects. CLARIN will provide consultancy, but it is expected that the projects will be financed mainly by national funding agencies, in part to insure that demonstrators do not only fit CLARIN needs, but also reflect national priorities.
  3. Preparing for specific roles. In this phase, countries need to plan the role they want to play in the European CLARIN infrastructure:
    • Would we want to host one of the main hubs in the future federation of archives?
    • Would we like to connect our existing archives to the infrastructure?
    • Would we want to host (or participate in) one of the centers of expertise that will be created?
    • Would we want to create a national or regional network of expertise?
  4. Essential resources: The CLARIN preparatory phase does not aim at the creation of new resources or technologies. While some languages already have extensive national language resources, others might benefit from launching projects or programmes at the national level that would run in parallel with CLARIN. Building national language resources and creating digital content must be done with national research funding.
  5. Events: At the national level, events such as conferences, strategic meetings and awareness events are instruments to bring national players together, including providers (such as language and speech technologists) and users (such as humanities and social sciences scholars).

Present and future funding of national CLARIN activities

  1. Participation in Working Groups and Events. The Research Council of Norway has kindly provided 100 000 Norwegian kroner for the second half of 2008 that has been used for two kinds of national activities. The first has been the participation of the Universities of Tromsø and Oslo in CLARIN working group activities, in particular attendance of a working group meeting in Berlin. The second is the organization of the present meeting at Solstrand (Dec. 15 and 16, 2008), in cooperation with the Language Council. This funding has been useful. There are signals that it will be possible to apply for funding of similar activities in the remainder of the preparatory phase (2009–2010), although unfortunately there will not be funding during the first half of 2009.
  2. Demonstrators: It is hoped that projects that include CLARIN demonstrators will be funded through national funding programs. To that end, CLARIN will ask the national funding agencies to point out the possibility of consultancy and technical support by CLARIN in relevant calls.
  3. Preparing for specific roles. In order to investigate the basis for assuming a role as a CLARIN center, the Research Council will be asked to support small pilot projects testing state of the art archiving services and grid solutions.
  4. Essential resources: Until now, there has not been adequate funding of the construction of large Norwegian language resources serving the humanities. CLARIN welcomes such activities, but funding of national resource construction and digitization remains a national responsibility. A plan dating from 2002 calls for a national investment of 100 M NOK to build up a good collection of basic language resources in Norway. Until now, this money has not materialized through Språkbanken, but perhaps there are other possibilities to build up a necessary core of Norwegian language resources which subsequently can be harvested and made accessible through CLARIN.