Publish research data

The general rule at Stockholm University is that research data is published with open access and in line with the FAIR principles to promote data reuse. By publishing your research data or their metadata in a data repository they become more FAIR and can be made accessible as open as possible and as closed as necessary.

Den här sidan på svenska.

Publish in a data repository
Select data repository
Data repositories curated by Stockholm University
Checklist for publishing
Advice for publishing software
Certain data cannot be published open access...
Reuse data
Contact

Publish in a data repository

There are major advantages with publishing research data formally in a data repository rather than sharing it informally on your webpage, with colleagues, or as an appendix to a published article. Research data that is published in a data repository is a publication on its own. It is:

described and documented in an interpretable and standardized way, so that it can be correctly understood and reused.
given a persistent identifier (PID, usually a digital object identifier, DOI), so that it can be persistently found, retrieved and linked to other publications.
given the usage license you deem appropriate (for instance CC-BY), to let others know what they are allowed to do with your research data and how you want to be cited.

Connect your research data and scholarly publication. This gives you a citation advantage and makes your work visible in statistic overviews.

Connect data and article. Mention the DOI of the data in Data Availability Statement in the article.

If you make your article and data refer to one another with persistent identifiers (e.g., DOIs) they link to each other and become more FAIR – and give your article a "citation advantage".

All in all, a data repository helps you make your research data more FAIR: Findable, Accessible, Interoperable and Reusable. This makes it easier to find, download, understand, handle and reuse them - and later to archive your entire research project.

Publish open access and in line with the FAIR-principles

Select data repository

Investigate which data repositories are commonly used in your field and are appropriate for your research data. The metadata fields in domain-specific data repositories can be more detailed and use domain-specific vocabularies that improve the description of the material. General-purpose data repositories can have greater cross-disciplinary reach.

The registry Re3data lists data repositories and can be of help in the process of choosing a data repository.

Re3data, a registry of data repositories

Data repositories curated by Stockholm University

Currently, Stockholm University offers curation and support when you publish in the following data repositories:

The curation entails that the Research Data Management Team reviews and suggests improvements to your metadata, to make your research data more FAIR and to enable automatic archiving of the published material. You forestall the Research Data Management Team and facilitate your data publishing process by following the checklist below.

Checklist for publishing

When you have chosen a data repository for your research data (or metadata, if the research data cannot be published open access) you can make the research data as open access and FAIR as possible upon publication.

Please fill out all the relevant metadata fields in the web form as completely as possible. An added document, a well-structured README text file (.txt) is helpful for future understanding of the research data.
File names. You improve research data's accessibility and sustainability if you name your files wisely before publishing and preservation. It is good if you decide on a consistent structure for how you name your files. Filenames should be informative and descriptive as to be findable and understandable in a cross-disciplinary setting. Filenames ought to include a date stamp. Filenames must not contain any forbidden characters or white space. The only permitted character set is A-Za-z0-9_-. Preferably use dot (.) only once, for separation of the file extension. These recommendations improve machine-readability and findability. DataCarpentry, DataOne, Dryad and Stanford offer guides to best practices in file naming.
File formats. You improve research data's accessibility and sustainability if you save your research data in common, open file formats before publishing and preservation. This makes the research data accessible to more users and for longer. The Swedish National Data Service offer more information about the file formats best suited for long-term preservation and accessibility. When proprietary formats offer important functionality and layout options (e.g. an Excel workbook with several sheets, embedded diagrams, images etc.), you should of course publish and preserve the research data in that format, but please consider also adding a version of the research data in an open, non-proprietary file format. It is important to describe the file formats used as accurately as possible, including references to the software (if possible with the version used) by which they were produced and the preferred software needed to open the files. This is particularly important when the item contains .zip or .tar-folders containing several different file formats.
Variables (column headings). Are your variables understandable, possible to interpret correctly – even by yourself in 5-10 years' time? Or by someone from another discipline? Is the unit of measurement noted clearly for every variable? Is there a need for any additional documentation as a README text-file (.txt), or a separate codebook to provide these details?
Standards and authorities. If there are standards or authorities (vocabularies, ontologies or other) that help describe and interpret your research data, link to these in both metadata and the actual data files. Authorities improve machine-readability and help make your research data more FAIR. Learn more about standards that enrich cultural heritage research from the Swedish National Heritage Board.
References. Please check that all links work properly.
Publication. Give full reference, including DOI(s), to the publication(s) that are based on the research data you are about to publish. If the DOI is unknown, e.g. because the article is not yet accepted for publication, a “dummy” entry can be made and amended later. Metadata can always be amended, even after a dataset has been published. Changes in datafiles and filenames however render a new version of the post (and a new DOI).
Please connect your ORCID to your personal account in the data repository. If you do not have an ORCID, you can register one and associate it with your university account.
Affiliation. State your affiliation correctly in the metadata. If the information has to be typed, please copy-paste the name of your department/institution from these lists: English/Swedish.

Advice for publishing software

When publishing software, it is useful to follow the advice below to make the software as FAIR as possible.

Describe clearly in metadata and README-file the programming language(s) of your scripts (e.g. C#, Go, Javascript, Python, R), if applicable also with version.
Do not put the README-file together with scripts (or datafiles) in a zip-file, but keep it separate (as .txt or .md – markdown), to be displayed directly in the repository interface, thereby allowing (re-)users to evaluate the content without first downloading the whole package.
Place a brief explanatory comment at the start of every program [and possibly inherent version history], including a good example of how the program is used. [1]
Decompose programs into smaller functions, that is a reusable section of software. Name functions, list their input parameters, and describe what information they produce. Functions makes it easier to test and troubleshoot when things go wrong.[1]
Avoid duplication. Write and re-use functions instead of copying and pasting code, and use data structures like lists instead of creating many closely-related variables, e.g. create "score = (1, 2, 3)" rather than "score1", "score2", and "score3". [1]
Document software dependencies and requirements explicitly so that mechanisms to access these exist. [1,2]
Provide a simple example or test data set that users (including yourself) can run to determine whether the program is working and whether it gives a known correct output for a simple known input. [1]
Submit code/scripts to a reputable DOI-issuing repository, just as you do with data. Your software is as much a product of your research as your papers, and should be as easy for people to credit. DOIs for software are provided e.g. by Figshare and Zenodo, both integrating with GitHub. [1] For software code/scripts specifically related to climate research the Bolin Centre at Stockholm University has a local GitLab code repository instance that will issue DOIs on demand for fixed releases of submitted software scripts. See the Bolin Centre support site for information and help.
We encourage all software produced in research projects to be published under an open source license. Examples are found in this list: https://spdx.org/licenses/ [3]
To benefit fully from possible tab completion, make all variable-, directory- and file names in to unique strings with distinct beginnings (so that no name is a substring of another in the same context). For directories and file names, use only the restricted character set [A-Za-z0-9-_.], with no white space inside.

[1] Wilson et al. (2017): Good enough practices in scientific computing.
[2] Lamprecht et al. (2020): Towards FAIR principles for research software.
[3] Akhmerov et al. (2019): Raising the Profile of Research Software.

Certain data cannot be published open access...

Research data containing personal data or sensitive personal data, data that is protected by secrecy in accordance with the Public Access to Information and Secrecy (2009:400), or data that is limited by proprietary right or copyright is not to be published open access.

... but must still be made available "as open as possible and as closed as necessary"

Research data that cannot be published with open access is made available "as open as possible and as closed as necessary" at Stockholm University by publishing metadata, a description of the research data, openly available in SND without uploading any data files. The Research Data Management Team offers secure storage for the data files, with a permanent link between the SND metadata record and the data storage. The Research Data Management Team is also responsible for ensuring that the data files are preserved and made available, if requested, but only after secrecy examination and only to authorized persons.

If you publish metadata about research data with sensitive information in a data repository other than SND, then you yourself are responsible for preserving and keeping the data files available. You must be able to deliver them if requested, but only after secrecy examination and only to authorized persons.

Truly anonymized personal data, i.e. data that no longer is possible to connect to a person, can be published open access.

Contact the Research Data Management Team on how to handle and make your research data accessible as open as possible and as closed as necessary.

Swedish National Data Service, SND
Documents, public documents (only in Swedish)
Secrecy regulations at Stockholm University (only in Swedish)

Reuse data

There can be great scientific and socio-economic benefits from working with existing data, such as collected and published research data, register data or openly available authority and cultural heritage data.

When you reuse data, you must pay attention to the possible license under which the data was made available, how the data may be used further and must be cited. You also need to be able to reference where the primary data is stored, in case someone wants to review or repeat your study (in the event that your processed secondary data cannot be published).

For some data, the conditions for reuse are clearly defined (and may, for example, require ethical review and special information security measures). Other data can be freely downloaded and reused without restriction.

When you reuse data, you need to create a data management plan in cases where you process the data to such an extent that it can be considered a new data set. You are also recommended to make the new data set available (to the extent that the original source allows) and to preserve it together with the project's other research information after the end of the project.

Contact

Research Data Management Team
For questions on research data management, publication and preservation.
E-mail: opendata@su.se

Last updated: December 4, 2023
Source: Research Data Management Team

Tell a friend

CONTACT

Research Data Management Team

Email: opendata@su.se

Publish data

Publish research data

... but must still be made available "as open as possible and as closed as necessary"

Research Data Management Team