This post originally appeared on the Metadata 2020 blog on Monday, November 20, 2017. Read more about Interfolio’s involvement in the Metadata 2020 group here.
We recently represented Metadata 2020 in a panel presentation in Washington, DC, at the National Academies of Sciences, Engineering, and Medicine’s symposium, International Coordination for Science Data Infrastructure. The event was comprised of short talks and panels that explored existing and emerging efforts in sharing scientific research data and discussed issues related to the design and use of such systems.
The one-day symposium was a great opportunity to tell others about Metadata 2020 and to share how they could participate in the cause. Our presentation covered the following items:
- Comments on the importance of improving the quality of metadata related to published research
- Metadata 2020’s formation and its goal of facilitating richer, connected, and reusable metadata
- Metadata 2020’s plans to identify challenges faced by stakeholder groups involved with the creation and management of research metadata—as well its desire to establish a “metadata maturity model” for evaluating and improving the quality of metadata
- Details on the variety of organizations and individuals involved in Metadata 2020
- Information on how the attendees could support Metadata 2020
Attending the symposium gave us the opportunity to network with experts in collecting and managing metadata and to gather insights helpful to Metadata 2020’s mission. We would like to share some of knowledge and advice we gained from the symposium, which includes the following items:
1. Don’t reinvent the wheel.
One attendee noted that Metadata 2020 should do all it can to collaborate with other organizations so that it does not duplicate what is already out there. As an example of a useful model, several participants discussed the FAIR Data Principles (FAIR is an acronym for Findable, Accessible, Interoperable, and Re-Usable) established to provide guidance for metadata and protocols related to research datasets. Because these principles have been heavily vetted for similar purposes, we should consider FAIR (and other tools created elsewhere) as we establish a data maturity model for collecting research metadata. That said, the broad representation of stakeholder communities from around the world that are active in Metadata 2020 bodes well for heeding this advice while making progress on the goals of the cause.
2. Incentivize contributors.
How can we motivate researchers to provide accurate and complete metadata? During the symposium, one NIH Project was noted as an example of increasing buy-in from researchers. During the funding application process, researchers can provide NIH access to their data, and the agency will pull, clean, and submit data on behalf of the applicants. And NIH will even return the cleaned data to the providers–thus giving value back to the researchers for their data contribution.
Metadata 2020’s various communities are considering ways to reduce the friction of providing metadata, while communicating the benefits researchers receive for providing accurate and complete metadata. Since contributors will be relied upon for the quality of metadata, their buy-in will be a key to success.
Note: Lili Zhang, from the Chinese Academy of Sciences, shared results from a survey that indicated why researchers did not provide metadata about their research data. The slide deck has been provided by the symposium sponsors–see below. This could provide information to help Metadata 2020 understand the motivations of researchers related to providing metadata.
3. Make interoperability a high priority.
A comment related to a common theme at the symposium was, “It all has to work, machine to machine, and we just help load it into the machines as data stewards.” Building a data ecosystem that supports ease of use, accessibility, and sharing (from one system to another and from one community to another) is a foundation to metadata management. Interoperability is a condition of connected metadata, and connected metadata supports richer and reusable data. As a result, technical advice will be important and needed from all communities involved in Metadata 2020’s efforts.
4 “Data maturity model” is a hot term!
Several attendees indicated that other models are out there or are in development. Resources mentioned include:
a. NOAA NOS Metadata Web Training–The NOAA NOS NCCOS CCFHR (Center for Coastal Fisheries and Habitat Research) is receiving metadata training on basic metadata, introduction to policy and standards, navigation and use of the Metadata Enterprise Resource Management Aid (MERMAid), and on Writing Quality Metadata for NOAA NOS. CCFHR are building records from scratch and wish to describe “fish transect data (excel spreadsheet style), habitat picture analysis, multi-beam data, hydro-acoustic data with fish biomass, GIS files, and the basic information that goes with an integrated assessment of an ecosystem.“
b. Earth Science Information Partners (ESIP) has a maturity model.
c. NSF Data Maturity Model mentioned, but we could not locate it by press time.
So that we do not reinvent the wheel, as noted above, these potential examples should be considered in Metadata 2020’s efforts to create a data maturity model.
The final agenda and presentation slide decks from the symposium have been posted on the National Academies website.
About the Authors
Scott Wymer is Vice President of Academic Technologies at Interfolio while L. K. Williams is Vice President of Academic Engagement there. Scott and L.K. are members of both Metadata 2020’s Platforms and Tools group, as well as the Researcher Community group.