An edited version of a snippet from my upcoming book, The Effective Scientist (due out in March 2018).
—
I tend to assume tacitly that my collaborators are indeed entirely fine with the idea of having their hard-won data spread across the internet, and that anyone can access and use them. In reality, many are probably not comfortable with that concept at all, and that the very notion of ‘sharing’ data with anyone but your closest and most-trusted colleagues is the stuff of nightmares.
I too was once far too concerned about the privacy of the data for which I had literally sweated and bled, for I feared that some nefarious and amoral scientist would steal, analyse, and publish them before I had the chance, thus usurping my unique contributions to the body of human knowledge. Perhaps I was just paranoid, although I still encounter such attitudes today. While data theft can occur, in reality it is unlikely that anyone would bother trying to out-do you in this regard, mainly for the simple reason that in most cases, data availability is not the limiting factor for scientific advancement. Another reason why this should not worry you is that far too few us have the time to publish all of our own data, let alone someone else’s.
As it turns out, not sharing your data can reduce your publication opportunities because your colleagues are less likely to know what you have been doing. By all means, it is reasonable to keep your data under lock and key until you submit the resulting manuscript to a journal, but this is normally within a timeframe of months rather than years or decades. But once you have started to publish from a particular dataset, then my sincere advice is to open it up to everyone.
Why? Even if you have the intention of mining your dataset for more analyses and stacks of new manuscripts over the coming years, making it available to the greater research community is more likely to make new opportunities rather than stealing them away from you. The most basic reason that this is true is that if some other research group ends up using your data, they will at least be obliged to cite you in some form.
Another advantage of this type of second-hand sharing is that more often than not, people using your data generally want to understand how you collected them, what limitations they might have, and if there are subtleties not necessarily covered in full in the accompanying metadata. To do this, they will generally try to contact you directly, ask some pertinent questions, and in many cases, invite you to join their authorship team.
Yet another benefit of data sharing is that different people have different perspectives, analytical techniques, and interpretations, such that alternative applications can identify new lines of inquiry and allow you to pose and test hypotheses that might have otherwise escaped your attention. These days, there are even journals devoted entirely to database descriptions (e.g., Scientific Data, GigaScience, Standards in Genomic Sciences, etc.), so the database itself represents a publication opportunity. In short, data sharing has so many more pros than cons that remaining a data hoarder is patently to your disadvantage.
Today, many — if not most — scientific journals do not really give you the option any more to be a data hoarder. It is becoming rare indeed for a journal to agree to publish your manuscript if it is not at least accompanied by an easily accessible dataset, which more and more commonly today means a link to a database in an online repository and an associated digital object identifier (DOI).
In other words, you tend to have diminishing opportunities to choose to limit the extent to which people can access your data, meaning that ultimately, it is really no longer up to you. I therefore strongly urge you to make your online datasets clear, complete, and well-described (e.g., via intuitive metadata). Your colleagues will not only be grateful, they will be more likely to do the same and share their data and publication opportunities with you in the future.