Documentation and metadata

Metadata is crucial for making relevant shared information, discoverable and accessible to others. In short - metadata can be described as structured information, that describes a resource, such as research data or other publications. Metadata also clarifies and contextualizes documents/collections, making them searchable. Therefore, it is imperative to make time for filling in metadata fields when research data is being published in order to allow the information to be located and reused. The purpose of metadata is to facilitate automatic management and categorization of information. In order for this to function, created metadata must follow certain existing standards. Instead of documentation, which is legible for human users, metadata is required so that it is legible for computer software.

What are the different kinds of metadata?

  • Descriptive: Information concerning contents and context. This is used to enable others to cite the information using scholarly notation. Examples of descriptive metadata include: titles, authors, subject, keywords, abstracts, methodology, etc.
  • Administrative: Information that allows the data to be categorized and correctly managed. Examples of administrative metadata are: file format, rights/licenses/copyright, preservation, etc.
  • Structural: Structural metadata is necessary to organize the previous two categories. Examples of structural metadata are: persistent links (e.g. DOI or URN), relational data as to how separate files are associated with one another, etc.

Metadata can be used for several purposes, such as:

  • Citations: Creates the possibility for rewarding and recognizing those who have created the content.
  • Reusability: In order to enable others to build upon research, they need to be able to easily understand how information has been structured. There must be a sufficient amount of metadata to allow another researcher to understand, for example - how data collection was performed and the meanings of different variables.
  • Searching/Finding: So that others are able to find the information and verify that it is correct. Metadata needs to answer the questions: Who? What? Where? When? Why? How?
  • Interpretation of data: By making it possible to understand how information was structured and collected, the data can be interpreted through different perspectives, thus more thoroughly evaluating the results. It is often also a great help for an author who wants to reuse their own data months or years later.

Data that cannot be shared

For certain legal, ethical or commercial reasons, it is not possible to share all data openly. It may, however, be possible to make the information searchable without granting access to the raw data.

When can research data not be shared openly?

  • If research data contain sensitive personal details or sensitive information. Bear in mind that non-sensitive personal details within the material can be published if made anonymous.
  • If there is no written consent from participants in a study whereby they agree to open publication of results (documentation of this is required).
  • If it includes materials to which someone else owns the copyright.
  • If the material contains information which reveals proprietary or financial information.
  • If the material has not undergone ethical vetting when such vetting is necessary.

Even if the data to be shared falls into one of the categories above, it is still possible to publish information stating that the research data has been collected. Many data repositories offer the option to register only information about data using keywords and a description. It is recommended that contact information is included when registering the metadata so that users can send inquiries concerning the data made searchable.

Information security

All employees of Stockholm University must work actively, efficiently and continuously with information security – that is to say, how different types of information are handled in different contexts.

The University’s information security procedures are coordinated by IT Services. They operate in terms of confidentiality, accuracy, traceability and accessibility.

  • Confidentiality - means that no unauthorized party will have access to the information. This is of particular importance for researchers who manage research data containing personal details.
  • It is also important to researchers that they are able to guarantee the accuracy of data, namely that the information will not allow unauthorized persons to make changes to it; neither intentionally nor unintentionally.
  • Traceability - means that it is possible to trace who did what within a system. This can be particularly important if a researcher handles sensitive personal details or other confidential information.
  • In order to fulfill the ideal of open research data, it is necessary that the material is made accessible. The information should always exist and be reachable by users when needed, either via the internet or by request.

Accessibility and long-term storage

When research data is made digitally accessible, it is important to consider the type of file format in which the information has been saved so that others can reuse the material. All types of digital file format have a risk of being made obsolete and thereby become illegible in the future. If this occurs, there is a risk that valuable research data may be lost.

The most important points concerning file formats to be considered by researchers are:

  • To use a preservation format right from the beginning if it is possible.
  • To use a file format which is not proprietary (e.g. .csv for tabular (spreadsheet) data and .txt or .odt/.odf for text).
  • To use a file format which follows an open standard, such as those developed by OASIS.
  • To use a file format which is commonly used.

The Swedish National Data Service (SND) has evaluated a number of different fileformats that they consider suitable for handling, long-term storage and the availability of research data. However, these may change over time according to technical developments.

Regulation RA-FS 2009:2 issued by the Swedish National Archives explains which types of file format have been accepted for preservation. The type of file format to be used depends on the original usage of the data. For example, databases and registers could have been stored as sequential files or XML files, while office documents may have been stored as PDF/A files. Unfortunately, there are currently no recommended preservation formats for digital sound or image files.

Archiving research data

Stockholm University is a state body, therefore taking responsibility for housing preservation archives of official documents. Research data created in connection with research projects is a meaningful part of the university archives and must be archived together with other documents from research projects. In accordance with The Archives Act, research materials must always be archived at their own institution. Raw data files, ethics permissions, research documentation, as well as published results should be archived by university regulations (in Swedish). Therefore, it is important that researchers delete drafts and cull documents of incidental or limited importance after a project is finished. Although, certain types of documents should be retained, as per the National Archives regulation RA-FS 1999:1. According to the Public Access to Information and Secrecy Act, state bodies are required to release official documents.

It is also important to consider preserving information which will allow others to understand what occurred during the research project, thus how the material should be interpreted. Research material should be submitted to the Archives and Records Office. When delivering a research project to the archives, the file formats used will be converted to archival formats. Archival formats are formats that are not dependent on a specific programe or software – thereby minimizing the risk that the programe software could become unavailable, rendering the information unreadable.