Module 7 Handbook

Site: CABI Academy
Course: Data Sharing Toolkit Learning Materials
Book: Module 7 Handbook
Printed by: Guest user
Date: Friday, 26 April 2024, 3:56 AM

Introduction

This handbook is designed to help you to answer the Module 7 activity questions.

Sustainable access to data is critical to ensure that data remains FAIR and safeguarded over the long term. 

This module will enable you to:

  • plan for sustainable access to data
  • recognise aspects that help ensure sustainable access to data
  • choose the right solution
  • understand the role of a data management plan

Why and when?

Why

You need sustainable access to data:

  • so existing research can be validated and build upon
  • to support digitally-enabled services that rely on sustainable access to relevant data
  • provided in a way that enables integration, analysis and use
  • to reduce risk and improve decisions
  • to support data ecosystems (it gives organisations the confidence to invest in using data to develop new products, services and research)

When

You need to plan for sustainable access to data from the very beginning of the grant proposal.

As part of this, you will need to consult the people and organisations impacted by access to the data.

Ensuring sustainable access to data will incur a cost and you should budget for this at the start of the project.

Graphic with DATA in middle and various symbols surrounding it

Aspects that help ensure sustainable access to data

These are three key factors that will help you ensure sustainable access to data:

Policies and processes

Policies

Many research organisations and publishers have policies to ensure research is reproducible. For example, the BMGF has its own Open Access Policy which requires data underlying the published research results be immediately accessible and open. 

You will find that many governments also have policies that require the publication and sharing of data that is essential in delivering societal, environmental and/or economic benefits. Such public sector data often includes 

  • geographic
  • environmental
  • demographic
  • and financial data
Processes

You should ensure processes, including Data Management Plans, cover requirements for longer term access to data.

People

You need to have a clear approach to data stewardship. This means:

  • assigning roles to people so it is clear who decides about, and invests in, datasets, so they are sustainable
  • ensuring roles are suitably resourced with appropriate time committed to managing and sustaining data

Technology

You should use technology to curate and control access to data. This will support the people, policies and processes.

Some of the methods you may wish to examine are:

  • open access platforms
  • non-open data platforms allowing access over time to appropriate users

You can find guidance on preparing data for publication and a non-exhaustive list of repositories approved by the Foundation on the Gates Open Research platform.

Graphic showing crossover of people/process/technology and representative images

Storing and managing data

You will find it helpful to ask yourself the following questions when choosing a data storage and management solution:

  1. What are the legal and contractual obligations?
  2. Who are the data users and what are their goals?
  3. What is the scope of the data reuse?
  4. Is the chosen solution suitable for the capability of both current and potential users?
  5. How will sustainable access be funded?

You can see more on each next.

Legal and contractual obligations

Do you need to retain the data legally?
You should check if the relevant governments require certain commitments or action.

For example the Ethiopian Government requires that soil and agronomy data is maintained in a database system by the Ministry of Agriculture in order to ensure that data will be reliably accessible to the research community in the future. 

Do policies and mandates require you publish the data? Under what conditions? 
The BMGF open access policy requires that data underlying the published research results be immediately accessible and open. 

How long do you need to make the data available for?
Does it need to be removed after a certain period of time (for example for privacy reasons)?

Graphic showing symbols of legal scales, access and community

Data users and their goals

Your users may specify needs to access or use data over time, or may have strategic goals related to access provisions. These users may be:

  • partners
  • funders
  • customers
  • users of a service

You may find the data ecosystem mapping tool helpful to identify existing and potential re-users of data and define requirements for sustainability. 

Graphic showing symbols of legal scales, access and community

Scope of data reuse

You can ask these questions to establish the scope of data reuse:

  • Is the data purely collected and needing to be available to back up research?
  • Is the data suitable for use in digitally enabled services, requiring ongoing updates and maintenance to ensure relevance? 

When choosing a repository, the desirable properties are that it enables:

  • access to the dataset
  • dataset persistence
  • dataset stability
  • searching and retrieval of datasets

You can find guidance on preparing data for publication and a non-exhaustive list of repositories approved by the Foundation on the Gates Open Research platform.

Graphic showing symbols of legal scales, access and community

Capability of both current and potential users

You will need to consider the capability of others in your ecosystem.

In addition to making data FAIR, consider if you should:

  • document your data - describe attributes, features and limitations
  • showcase existing ways to use the data
  • become an active member of the community, allowing others to reach out to you for help or to make suggestions

As a data publisher you can use self assessment tools to show that you are following best practices in enabling data re-users to use data with confidence. 

Funding sustainable access

If there is an expectation that the current data holders provided sustained access, you will need to ask how this will be funded.

As an alternative you could consider depositing data with a third party. In this case you need to question their approach to sustainable access to data, including:

  • funding
  • conflicting interests
  • commercial interest

For example an agreement between the research repository FigShare and the LOCKSS (Lots Of Copies Keeps Stuff Safe) alliance means that if a “trigger event” is caused on FigShare (e.g. they cease to exist or a commercial interest changes access permissions) then the controlled copy in the LOCKSS archive can be released

Graphic showing DATA in the centre with dollar signs in each corner

Types of data repositories

You will find that each type data repository fulfills a different need and set of requirements.

Research data repositories

There are two types of research data repository, both of which broadly allow anyone to contribute data:

1: Discipline-specific

Examples include:

2: Interdisciplinary

Examples include:

Many are backed by large communities of funders as well as academic journals.

Government data platforms

These provide a platform through which any government or country specific data can be accessed.

Governments often have clauses which require anyone working with specific types of data, be it as a government department or third party organisation, to make data available via the official government platform.

Examples include

Curated data repositories
  • Provide sustainable access to carefully managed and curated datasets
  • Offer data services (such as direct access to the data via an API rather than just file downloads)
  • Sustainable through country memberships and donor contributions
  • Provides most flexibility for specific data services

If considering curated data repositories, you should be aware that establishing and sustaining a curated data repository is challenging and many are not able to be sustained long term.

Examples include:

Code and data platforms

These are hybrid platforms somewhere between the research data repository and the curated data repository. They can offer you management features such as:

  • version control
  • ingest pipelines
  • validation services

One of the most popular platforms for both code and data is GitHub.

Graphic with symbols representing research, government, and curated data repositories, and code and data platforms

Using multiple data repositories and platforms

You could consider choosing a number of solutions to give you multiple benefits. This approach means:

  • data can be replicated and transformed into multiple versions simultaneously
  • you can provide for more people if the legally-required solution is not appropriate
  • you must plan early where to deposit your data

Graphic with symbols of multiple data repositories emphasised with colour

Data management plans

A data management plan will help you:
  • outline how data is handled during a project
  • outline how data is handled on completion of a project
  • consider data management before project commencement
  • ensure data is safeguarded and widely shared

You should involve all stakeholders, including financial donors in creating a data management plan. It is a collaborative, iterative, process.

You can reocognise a good data management plan as it will feature:
  • a data inventory that lists the data and identifies any third party rights in the data
  • a list of platforms and agreements that support the sustainability of the data (e.g. in government data platforms)
  • a set of roles and responsibilities related to the continued management of data
  • a clear and realistic budget
  • any training or capacity development needs

You can use the Developing a data management plan checklist to put in place an effective data management.

Graphic with stylised clock and data symbols

Tools and guides

You can use the following tools and guides to support data sharing:

Summary

You can find all the key points from this Module in the Cheat Sheet: Ensuring sustainable access to data

Don't forget to complete Module 7 activity questions to review your knowledge of this topic.