Data deluge: volume velocity variety
Challenges: Privacy and security – digital data (surprisingly fragile) vs analog data. How to discover access reuse. Open data? (public funds used to create it – need to figure out how to give it back to the funders)
Why should librarians get involved? Trust building – interdisciplinary/collaborative work, and some libraries already have an existing infrastructure to preserve & disseminate digital content.
6 areas of data services
- People are still playing it safe for now. Usually start with a consultation service and grow from there – workshops & one on ones
- Skeptical – why am I here to see you? Referred by research office but don’t really get it. Heather tries to be practical and does follow up if they get funded: “here’s what you said in your plan, how will you implement?”
- Consult – encourages them to use their IR, and also any subject depositories that are relevant. Also make sure they are meeting the funding agency’s requirements. Often faculty don’t even know what they have access to in terms of support, so she tries to help connect.
- Know your institution. Talk to people one on one – key. Faculty are concerned with reputation – scientific integrity. And relevance to Promotion & Tenure.
- Build this into graduate research classes if possible – get them thinking about this early.
- Training strategies – make it relevant. Don’t get stuck in the why – focus on the what and the how. Look at some of the data sets on your campus? What is it? What does it look like? Can you help make it better?
- Metadata & Documentation
- Lots of overlapping interst with librarians – looking to things like compatibility for reuse – tho not always interested in making their own compatible; citeability – they want credit; discoverability a little bit. Libs also looking for provenance – is this from where it says it’s from?
- Many forms of documentation: lab notebooks, codebooks, lab protocols, grant and research proposals, and finally the research articles.
- Metadata produced automatically – what can we easily transform to meet existing standards – what info is needed to open, understand and work with the file.
- Pitfalls – reject researcher metadata as it doesn’t meet our standards. But same time require too much of them – not enough help in how to apply standards we point them to. Don’t leverage existing documentation – we didn’t produce that guide, can’t be relevant; don’t think to talk to metadata or cataloging libs about it.
- Solutions: data curation profiles – use what’s out there already. Give researchers a limited amount of metadata to choose form so they don’t go crazy trying to sort thru hundreds of options.
- Requirements: name, titles, etc. also asking for file format and interoperability info as well as methodology.
- How long? Standards vary from discipline to discipline and even grant to grant. Not the same as our conception of how long we preserve (forever).
- How much? Only a small portion perhaps.
- What formats? What’s it in now? Something custom to the researcher or a common/open format.
- How much will it cost? Lots = more.
- Challenges – refresh & replicate data so it can still be used. Migration – new format. Emulation – environments (digital forensics labs) that allow folks to work with the data.
- Quality auditing frameworks
- Research Data Alliance – developing certifications to show that this is a trusted repository.
- Other options – partner with your archives, secure physical storage (especially helpful for grad students or the lab-less – cages, lockers, etc), replicate the digital data.
- First level – You know about them, the researchers probably do not. Both where they can put stuff and where they can find stuff to use. DataBib (http://databib.org/) is a directory of repositories. Help direct them to appropriate sources
- Second level – Refactor your IR to extend it to data sets
- Third level – have your own data repository
- Data Citation
- Instruction/info lit outreach – this is something you need to cite just like a published article.
- Outreach to writing labs, brochures, university press/publisher to include instructions for authors on depositing and citing data
- Practice what you preach! Cite your own data, suggestion citation for data in your libs, show publications that require data to be cited, etc.