In a previous post on this blog we mentioned the new data management protocol by NWO. NWO requires that applicants write a data management paragraph in their proposal and a full data management plan once granted. Leiden biochemist Remus Dame participated in the data management pilot prior to the protocol and wrote an exemplary DMP. In an article in NWO Hypothese (“Data in de etalage”, in Dutch) he tells about his experiences with data management planning.To lead by example he also published his data management plan in our repository (Data management plan VICI, in Dutch). Read the English translation of the article in Hypothese here.
Keeping a grip on ever growing data sets
Original article by Martine Segers, translated by Rutger de Jong
“Thinking about how to store the results of measurements digitally, is directly related to the core of my research. This is why the amount of extra work involved in open data is minimal to me”, tells biochemist dr. Remus Dame of Leiden University. He was one of the participants in the data management pilot that was carried out last year to prepare for the introduction of the new policy.
“Our research generates ever growing data sets”, says the Vici-laureate. “That’s why we were already working on storing our measurements more systematically and on adding additional metadata in case someone leaves our lab. It prevents having to repeat analyses, something that was sometimes necessary in the past.” Also, to enable external re-use of his data, extra metadata need to be added, for example details on the analysis method used.
The costs that accompany the new requirements on data management appear to be less than expected. “We will be pre-processing all of our data, otherwise it will be far too expensive,” explains Dame. There is another reason not to choose for storage of raw data. “You should not pollute repositories with data no one wants to use. If we determine the DNA-binding properties of proteins with a standard method, for example, the calculated values are far more interesting than the raw data itself.”
He expects the amount of relevant data from his Vici-project to be under 100 Gigabyte. “The 4TU repository that we are currently opting for, there being no archive for biological data as yet, asks about 4,50 euro per Gigabyte for a period of fifteen years. The costs will therefore be less than 500 euros, a feasible amount. In our lab we want to keep our raw data itself, for a shorter period of time. This will be far more costly: think of ten to fifteen thousand euro.”
By participating in the pilot, Dame made some useful connections with the IT specialists of his university and the data librarian of the university library, who was most up to date on the international standards for metadata and the available repositories. It forced him to think in an even more systemic way about the storage of data. “This is what I now try to get across to the people working in my lab.”