Module 3 handbook

Site: CABI Academy
Course: Data Sharing Toolkit Learning Materials
Book: Module 3 handbook
Printed by: Guest user
Date: Thursday, 25 April 2024, 1:47 AM

Introduction

This handbook is designed to help you to answer the Module 3 activity questions.

Data for digitally-enabled services is often required from multiple third parties, such as:

  • geographic data
  • population and socioeconomic data
  • sample data from previous projects. 

In this module you will explore barriers to and techniques for reusing such data.

This module will enable you to:
  • understand rights and permissions in data
  • identify open, shared and closed data
  • obtain rights to reuse third party data
  • consider compatibility with existing country and funder policies
  • consider reliability and interoperability of data from third party sources
  • make decisions for using data from third party sources
  • understand stakeholder roles in the data ecosystem
  • map a data ecosystem
Graphic with icons representing data

Reusing data from third party sources

Benefits

1: Time saved

  • You can build on top of existing data to save significant time in collecting data that already exists
  • You should encourage well maintained and good quality data, so less of your own time is required to prepare data prior to reuse

2: Taking advantage of existing activities and communities

You can build good relationships with data holders so you can take advantage of existing activities and communities around the data to support your aims.

3: Return on Investment

You can reuse data to reduce costs and increase returns from previous investment in data collection and platforms.

Constraints

You will find that data holders often do not have clear policies and processes in place to support the sharing and reuse of data.

This is commonly due to concerns about risks and impacts related to the sharing of data.

Solutions

You should work with data holders early in a proposal development to ensure the appropriate mechanisms are put in place to support access to and sharing of data.

Data graphic with various symbols indicating collection and reuse


A spectrum of data permissions

To help you understand the complex landscape of data access and associated rights and permissions, the Open Data Institute has produced the Data Spectrum.

This chart neatly illustrates the different types of data, from closed, through shared to open.

You can discover more about each type of data next.

Data Spectrum graph illustrating different types of data


Closed / Limited data

You can recognise closed data by the following characteristics:

  1. Closed data is is only available to a few individuals
  2. Closed data is often closed to minimise harmful impacts
  3. Closed data permissions for reuse are likely to be very restrictive

Examples of closed data:

  • sensitive legal documents
  • employment contracts
  • personal health records
  • smallholder finances
  • pesticide application permits
Graph showing closed/limited data spectrum

Shared data

You can recognise shared data by the following characteristics:

  1. Shared data permissions for access and use are often restricted (e.g. non-commercial use only)
  2. Shared data can be restricted to minimise harmful impacts
  3. Shared data can be restricted to preserve commercial advantage
  4. If re-used, shared data comes with the same restrictions and reuse limitations for subsequent sharing

Shared data is the largest category of data.

Examples of shared data:

  • reports of crop disease
  • medical research data
  • social network data
Graph showing shared data section of data spectrum

Public data

You can recognise public data by the following characteristics:

  1. Public data covers all data that is visible to everyone but with limited or unclear rights
  2. Public data is not necessarily freely reusable
  3. You may need to approach the rights holder of public data to establish reuse restrictions

Examples of public data:

  • patented pesticide formulas
Graph showing public data section of data spectrum

Open data

You can recognise open data by the following characteristics:

  1. Open data is data that anyone can access, use and share
  2. Open data can be reused without restriction and shared forward with others (providing reuse is legal)

Examples of public data:

  • earth observations
Graph showing open data section of data spectrum

A data spectrum for agriculture

The Open Data Institute has produced a version of its popular data spectrum for agriculture, which you can find on the Open Data website, or see the image below.

You can find help to understand, use and interpret the data spectrum in the FAIR data toolbox.

Graph showing closed-shared-open data spectrum for Agriculture (Sub-Saharan and South Asia focus)


Obtaining permission

By default, anything that involves an ‘intellectual effort’ in its creation is copyright protected. The creator has exclusive rights to that creation. The requirements for re-use include:

1: You must be given permission from the rights holder (in this case the creator)

2: You will usually need a data sharing agreement or license

3: You must ensure permissions to access, use and share data are clear. To achieve this you will need:

  • explicit licences and agreements
  • to understand relevant legal restrictions or permissions
What if the datasets are not open?

You may find negotiation is required with the rights holder to see what can be done to ensure third party data can be used.

It must be used in a way that is compatible with the policies and processes set out in the conditions of the grant or those in country. 

Your role could include helping data holders to address any concerns they have related to data sharing or to set up a data sharing agreement.

Open, shared and closed data graphic

What to do when things are unclear

You may find some reuse permissions related to a dataset are unclear and a rights holder not listed.

In this situation, data may still be usable and this use covered by a fair use (or fair dealingexception or other local law.

If this is not the case, then you could make a risk-based decision over the collection, use and sharing of the data. You would undertake this with the understanding that the rights holder could reemerge and it may be necessary to:

  • backtrack
  • negotiate continued use
  • potentially face a legal challenge
Graphic showing scales with fair use and copyright logos on either side

Compatibility with existing national, regional, local and funder policies

Reuse permissions can be more complex in certain situations, such as combining datasets.

You should take careful note of existing in-country policies or funder policies (such as the Bill & Melinda Gates Foundation open access policy) that exist that set conditions related to FAIR and safeguarded data. These might:

  • set out incompatible terms and conditions, for example requiring data to be available for commercial reuse or explicitly blocking it
  • include requirements relating to embargo periods for data sharing

You may have to seek concessions from one or more of the parties involved. 

The FAIR data toolbox contains a guide on considering data rights and permissions in investments to help you with this.


Reusability of data

As a user of the third party data, you will need to check the practical reusability of the data.

You should:

  1. Ensure third party data is using standards compatible with your needs, including how data is collected, units of collection and data representation and exchange
  2. Check coverage. Does the data actually relate to what you require, e.g. the geography?
  3. Check timeliness. Does it cover the right period, is it up to date?
  4. Check granularity. Does it include enough detail to be used in the ways intended?

Following these checks, you may need to instigate further processing to:

  1. Check the data for, and deal with, any errors
  2. Remove unneeded data
  3. Transform the format or structure of the data
  4. Apply techniques such as anonymisation to minimise harmful impacts
  5. Combine or enrich the data with other data sources

The FAIR data toolbox includes a case study on harmonising data from third party sources as well as a guide on anonymisation.


A decision making flowchart for using data from third party sources

Decision making flowchart for using data from third party sources

Mapping data ecosystems

Key to ensuring data is FAIR and safeguarded are the peoplepolicies and processes that support its use.

You can create a map - a visual representation - of a data ecosystem to help:

  • identify the key people, the relationships between them and the different roles they play
  • describe how data is being shared across The Data Spectrum
  • show how data is being used to deliver a service
  • show where data is not being shared
What tools are available?

You will find guides to help you understand personas in agricultural ecosystems in the FAIR data toolbox. You can use these in conjunction with this methodology for mapping data ecosystems

Data ecosystem map showing four representative stakeholders connected by data sharing options

Stakeholders and roles within a data ecosystem

You can identify the key stakeholders in a data ecosystem, the relationships between them and their roles with the following information:

Innovator

  • reuser of data to help develop and scale solutions based upon insights for consumers
  • ensures rights and permissions to use data within solutions

Grantee

  • responsible for ensuring everyone in the grant ecosystem has the appropriate rights and permissions to access, use and share data

Researcher

  • collects and analyses raw data
  • reuser of data
  • needs to ensure that rights and permissions are obtained and provided that allow the storage, use and sharing of the data with others

Policy maker

  • data steward, manages a number of key national datasets
  • needs to provide permissions for others to use the data
  • needs to protect the rights of those the data is about

Program officer

  • responsible for delivering FAIR and safeguarded data within investments
  • ensures investments deliver impact and are in line with objectives and policies
  • supports the grantee through capacity building and facilitating conversations with other stakeholders in the data ecosystem where necessary

Central government

  • responsible for national policies on data and information
  • contribute to the enabling environment in-country
  • set expectations on access, use and sharing of data

Summary

You can find all the key points from this Module in the Cheat Sheet: Reusing data from third-party sources.

Don't forget to complete Module 3 activity questions to review your knowledge of this topic.