[Data and Analysis]

Open and reproducible Research

Open and reproducible Research

Why share?

Sharing research is good for everyone. It allows verification of research that has been done, decreases duplication which increases the amount of productive research which gets done and allows groups with fewer resources to still participate in productive research.

Sharing can also be good for the group that publishes the work. It increases the impact of the work by allowing more people to access it, it allows the development of collaborations with new researchers and groups from around the world and it also ensures that all researchers get credit for the work they do, not just PIs or grant holders.

As well as the inherrently positive aspects of open research, we are starting to see changes in policy in areas like research funding and publishing that will frther increase the importance of open research. In LightForm we want to be ahead of the courve in adopting these practices.

Barriers to sharing

Some people are concerned that if they share incomplete ideas or datasets then other people will steal them. While this is theoretically possible, this rarely happens. If we ensure that there is an easy way for people to cite the work which is released, then people will likely cite it. As for people ‘stealing’ incomplete ideas and publishing them as their own, if the work is of any value then it would take a long time for others to reproduce the expertise of the publishing group in order to bring that work to publication.

Another barrier to sharing data is lack of knowledge and lack of time. These are barriers that are now reducing as there are a greater number of resources now available to educate people about open research and also funders and PIs are putting a greater value and emphasis on sharing which means that we can afford to spend the time working on it.

Five star open data

Tim Berners-Lee, the inventor of the Web set out a 5 star scale for Open Data. It sets out guidelines for sharing data, specifying some criteria that increase the .

We are required by our funders to share the data used in our publications. The minimum requirement for open data is that the data is shared under an open licence, this would be one star data. Once you have published one star data it takes little extra effort to increase the quality to 5 star data, but it greatly increases its value by making it more sharable and more interoperable.

★ : Make your data available on the Web under an open license ★★ : Make it available as structured data (e.g., Excel instead of image scan of a table) ★★★ : Make it available in a non-proprietary open format (e.g., CSV instead of Excel) ★★★★ : Use URIs to denote things, so that people can point at your stuff (e.g. Use the DOI feature of Zenodo) ★★★★★ : Link your data to other data to provide context (e.g. include metadata, link paper and data, link data in the Zenodo LightForm community.

We discuss each of the things required to reach 5 star quality datasets in separate pages on the wiki.

Further reading

A manifesto for reproducible science is an excellent piece which covers some of the current issues in scientific research and highlights some ways in which we can move towards a more open and reproducible workflow in research