SURF: sharing (confidential) data for the benefits of research
SURF is a cooperative association of 108 Dutch universities, universities of applied science, senior secondary vocational institutions (MBO), UMCs and research institutions. They work together to purchase or develop the best possible digital services and to stimulate knowledge sharing by continuing to innovate. On September 24th, in the upcoming community meeting, SURF will further elaborate on making data sharing possible for the benefits of research. In advance, we spoke with Erik Kentie (Community Manager) and Freek Dijkstra (Design Expert) from SURF.
How to facilitate the sharing of large amounts of data
“Sharing research data is done on a large scale in the research community,” says Freek. “After all, data is the source of knowledge. And since it is the rule rather than the exception that several researchers jointly work on a publication, it is necessary that they can share datasets.”
With Research Drive, SURF offers an online environment where researchers can store their datasets and share them with each other. Freek: “SURF specifically focuses on the storage and processing of large amounts of data. For example, data from an institution such as CERN meant for physics and over 20 petabytes of data from LOFAR for astronomical research. Another example is Project MinE in which data is shared for international research on ALS disease. All these different studies use their own data formats and their own solutions. Erik: “To share data, it is therefore very important that FAIR principles are applied. FAIR stands for Findable, Accesible, Interoperable and Resuable. FAIR principles provide guidelines for preparing research data for reuse under clearly described conditions by both people and machines. Every researcher who starts working with the SURF infrastructure is obliged to write a data management plan in which it is described how the FAIR principles will be implemented. This ensures that other researchers also have easy access to the data. ”
A tough nut to crack: sharing confidential data
Sharing confidential data is one of the biggest challenges for researchers. Freek: “The difficult thing about science is that a study must always be reproducible. Thus, datasets that are used must preferably be accessible to other researchers. However, especially when it comes to business-critical data or privacy-sensitive data, organisations want to maintain control over who has access to that data and over what is done with it.” In the Data Exchange project, SURF is investigating how and under what conditions organisations can share confidential data. Freek: “For example, we are investigating how a trusted third party can play a role in this. We have built a prototype of a platform that a researcher can use to ask a data provider for permission to use a specific dataset for a specific analysis. After permission is granted, the analysis of that data is run by a third party in a secure container. An additional check is built in for the data provider, who may assess whether the result of the analysis can be used by the researcher. This is done to prevent certain data from being released unintentionally, which makes the risk of data leaking much smaller. By building in this extra control step, researchers can perform analyses on confidential datasets more easily, while data providers are inclined more to share their datasets for research purposes. This data provider can be another research institution, but also an organisation from a different sector or domain, such as finance, agriculture or healthcare.”
Opportunities of cross-sectoral data sharing exploited insufficiently
However, the sharing of data between, for example, a hospital on the one hand and a research institution on the other hand is still not taking place sufficiently, despite the fact it has a lot of social and economic potential. Freek: “Although hospitals are willing to share medical data, for example about cancer, with research institutions for the benefits of research, this does not happen at the desired scale. This is due to a lack of agreements on how this data could be shared across sectors. Answering the question alone on what kind of privacy-preserving technologies have to be used to solve the privacy challenges of data sharing, can lead to different approaches.” Erik: “The dot on SURF’s horizon is that we will make data from society and industry accessible to researchers in a suitable way, so that research can be carried out faster and even better. For example, a company that specialises in vegetable seeds, already shares its data via SURF with the University of Amsterdam for the development of new techniques for seed breeders. This sort of research could potentially solve the global food shortage.”
Together exploring the conditions for data sharing
SURF joined the Data Sharing Coalition not only to share its own knowledge, but also to learn from other participants. Erik: “There is still much uncertainty about the conditions on which data can be shared. Just as SURF, many organisations are investigating how to do it right and by exchanging knowledge, we can really unlock the possibilities that data sharing offers. The Data Sharing Coalition enables us to do so.”
On September 24th, SURF will give a presentation during the community meeting in which it will elaborate on the Data Exchange project. Do you want to attend this community meeting? Do not hesitate to contact us.