EC consultation on Open Data - my presentation.
Tue Jul 2, 2013
The following is the written representation that I made to the EC hearing on Open Data on behalf of Co-Action publishers, Copernicus Publications, eLife, F1000 Research, FigShare, Frontiers, Open Books Publishers, PeerJ, the Public Library of Science, Ubiquity Press and Bloomsbury Qatar Foundation Journals (QScience). I had a five minute slot to present, and the key recommendations at the end of this written response formed the basis of that presentation. I added one slide at the end with a personal view on some of the challenges of getting researchers to share data.
- Written representation
- How can we define research data and what types of research data
- When and how does openness need to be limited?
- How should the issue of data re-use be addressed?
- Where should research data be stored and made accessible?
- How can we enhance data awareness and a culture of sharing?
- Key Recommendations:
should be open?
Research data are primary outputs of a research process that are intended to be incorporated into research communications as support for the claims of that research. As with other research outputs, research data can be valuable to others for the purpose of validation, confirmation, and critique of research, as well as for entirely new applications that were not considered by the original researchers. Research data can be in any format.
Research data generated with the support of public funds is a public good and should be open by default. This means making it available in a format and with contextual information that makes it technically usable, with the legal rights that enable re-use in any field. We endorse the concept of “Intelligent Openness” described in the report of the UK Royal Society “Science as an open enterprise”. Data needs to be accessible, intelligible, assessable and usable. Given a default position that data should be open the more appropriate question to ask is where and how should that openness be limited.
Research data created with support from public funds should be open by default. Release of, and access to, research data should be limited in cases where making the data available will do more harm to the public good than restrictions of access. Such cases might include clinical data where personally identifiable information is included, data that reveal the location of critically endangered species, data that has potential to create a public health or security risk, poses a danger to the researchers themselves, or where the release of data would damage the conduct of the research itself.
The appropriate approach to limiting openness will depend on the case in hand and should be subject to a risk assessment by appropriate experts. Methods for restricting access with a proven track record include, delaying publication for a defined period, providing access under specific conditions, or only to approved persons, or after review of a specific access request. Such systems need to be carefully considered and designed so that they hamper access as much as is appropriate but no more. We emphasise again that the default should be open, and such systems should be used only where they can be justified.
The question of access to research data where there is a commercial contribution to their creation is a separate issue. Research data created by private interests is the property of the creator and can be shared in a way that advances their interests. Where public and private interests contribute to the creation, collection, or analysis of data it may be appropriate for the release of data (and other communications) to be delayed for some defined period. Such periods should be negotiated and defined in collaboration and grant agreements. We would recommend that the maximum such period should be one year after the conclusion of the project or two years after the creation of the data, whichever occurs first.
If data is to be made available with the intention of maximising the economic impact and the public good created it is critical that re-use be enabled both technically and legally. Researchers should adopt best practice in the formatting and description of data and the Commission and other funders and community groups should support the creation, documentation, and communication of such community best practice. Data should be placed in the repositories and archives that best support the availability, discoverability and usability of the specific data. Data should be released under a license which maximises the potential for re-use and recombination of that data. The appropriate licenses are the Creative Commons CC0 waiver and CC BY copyright licenses. This approach is similar to that of “Intelligent Openess” described in the Royal Society Report.
To create incentives for publicly funded researchers to maximise the re-usability of their research outputs it is important that the Commission and other funders adopt and develop approaches that measure re-use and require researchers to report on the re-use of their data. The Commission and other funders should engage with the emerging tools for tracking re-use and engagement of web resources, including data. These include initiatives and tools to support Data Citation, measures of data downloads, and online conversations around this data.
Research data should be made available in the place which best supports its use in a sustainable and reliable fashion. The question needs to be addressed on a domain by domain basis, as well as for specific data types. Funders play a critical role in supporting the infrastructure that makes data available, both its creation, and long term sustainability and shared systems and community infrastructures need to be put in place to support these services in the long term.
It is generally not ideal for data to stored as supplementary data to published research papers on a publishers website. While we recognize that this is the current default for many domains of research we recommend a shift towards the housing of data in dedicated repositories, ideally specialised for specific data types and domains, but in any case focussed on the preservation, discoverability, and re-use of data as opposed to research papers. To ensure the connection between data and research communications that may be spread between a number of different repositories it is critical that effective and persistent citation systems are in place to link research to its supporting data.
As noted above a critical aspect of enhancing data awareness and building a culture of data sharing is for funders to explicitly show that they value data sharing as a primary output of funded research. Specifically funders should engage with the emerging field of usage measurement and require evidence of re-use from researchers. An infrastructure that supports data citation and usage tracking is required to provide the underlying data to support this and we recommend and support the principles of the Amsterdam Manifesto on Data Citation.
Funders should act as exemplars of data sharing by sharing their own data effectively and efficiently. In addition the creation of specific funding initiatives, such as Marie Curie Fellowships that support the creation of data sharing platforms, or those providing data resources that improve reproducibility and re-usability, send a powerful message on the importance of this work to the Commission. In addition to acting as exemplars funders will also need to place achievable and appropriately scoped requirements on grant conditions to ensure effective data sharing. Such conditions should be clear, agreed, and most importantly auditable. These need to be combined with advocacy for data sharing, collection and promotion of success stories, and clear rewards for those leading the development of best practice in specific communites.
- Publicly funded research data is a public good and should be shared effectively to maximise the benefits that arise from the public funding of research. To achieve this the default position must be that data is open.
- We endorse the concept of “Intelligent Openness” from the Royal Society report. Data must be accessible, legally usable, and technically usable to maximise the benefits from sharing.
- In specific and limited cases access to, or release of, research data should be restricted. There is existing and appropriate best practice in this space that can be adopted.
- To support and maximise the re-use of publicly funded research data funders should promote and require best practice in data sharing and explicitly monitor and reward those who can demonstrate the re-use of data generated.
- Research data should be made available from the place or places that best support its discovery and re-use, preferably in subject specific repositories. This will differ from domain to domain and between types of data.
- Support for the development of infrastructure that tracks the usage and discussion of data is crucial.
- Systems that support data citation and the tracking of usage are developing, require support, and should be retained in the public domain. We recommend and support the principles of the Amsterdam Manifesto on Data Citation
- Funders need to act explicitly to demonstrate that they value data sharing. This can be achieved through a) acting as exemplars of best practice in sharing their own data b) supporting those that demonstrate and embody best practice in datasharing and the development of new tools that support data sharing c) requiring data sharing as a condition of funding.