While you have little choice these days about posting your data and code online when you publish, here are some things to consider when contemplating putting potentially sensitive data online (modified excerpt from The Effective Scientist).
One aspect of making your data publicly available is the prickly issue of whether your data contain sensitive information.
Of course, there are many different types of ‘sensitive’ information that might accompany the more basic quantitative measurements of your datasets, with perhaps the most common being personal details of any human subjects. For example, if you are a medical researcher and your data are derived primarily from living human beings undergoing some procedure, trial, or intervention, then clearly you are bound by your human ethics approvals not to publish information like names, addresses, or anything that could be used to identify the subjects in your sample. In fact, human ethics approvals generally prohibit any sort of public accessibility to medical data that has personal information included; thus, the scientists concerned are being pulled in two different directions — keeping their subjects’ personal information out of the hands of the public, while still making the data available to other scientists.
There are ways around this, such as publishing only generic information online (i.e., by excluding personal identifiers) that could then be linked to the more sensitive data via unique identifiers. In these cases, any other researcher requiring the additional information would have to seek specific permission from the primary researchers, pending additional human-ethics approvals.
Survey data, where individual people are asked questions on anything from their personal habits to their voting preferences, can also come under this umbrella of data sensitivity. Even data mined from social-media platforms can be restricted on privacy grounds. Often commercially sensitive data are in the same category, such as mining leases, fishing grounds, and hunting sites.
It is therefore the responsibility of both the researcher and the committees granting ethics or permitting approvals to decide on an optimal trade-off between data dissemination and the protection of individual privacy or commercial privilege.
Many other types of data sensitivities abound, such as those that must be considered given the propensity of certain nefarious types to exploit scientific information for their own personal gain.
photo: John Long, Flinders UniversityOne rather galling, generic example of this comes from the fields of archeology and paleontology, whereby scientists who have published the location of deposit-rich sites have been horrified to discover that curio and fossil hunters have pilfered precious specimens after reading the relevant scientific articles.
As a result, most online archaeological or palaeontological datasets today either do not publish the site locations at all, or they deliberately add a location error so that only specialists will know where to look.
An even more disturbing behavior is becoming increasingly frequent as would-be poachers and pet-traders use the scientific literature in ecology and biodiversity conservation to discover the locations of rare and endangered wildlife and plants. By virtue of being rare, many species are considered highly valuable in the trade of parts or pets — just think of rhino horn, elephant ivory, rare orchids, and tropical aquarium fish.
If you happen to research any rare species, fossils, human remains, or other potentially valuable specimens, do think carefully about what data you make publicly available online. At the very least, do not tell people where you found them.