Frequently Asked Questions
- What is metadata?
- What is a digital collection?
- What is "Core" metadata?
- How did you come up with these guidelines?
- How do these guidelines affect me/my unit?
- What is a controlled vocabulary? What is an authority?
- How can I make my content more findable in Google/Google Scholar?
- What is a schema?
- Which metadata schema/standard should I use?
- What punctuation/case conventions/style guide rules should I use for text entries?
- What is a URI? What's the difference between a URI, URL, and URN?
- How do I create metadata?
"Metadata is structured information that describes, explains, locates, or otherwise makes it easier to retrieve, use, or manage an information resource. Metadata is often called data about data or information about information."
See our Metadata 101 page for more information.
In the context of the Metadata Working Group's guidelines scope, we define a digital collection as "unique, Emory-created collections of digitized or born-digital content, intended for delivery to an Emory Libraries-supported repository or discovery tool". Examples of digital collections at Emory include Emory Theses and Dissertations, the Pitts Theology Library Digital Image Archive, the Emory University: Michael C. Carlos Museum online collection, Rose Library: Langmuir African American Photographs, Women Writers Resource, the Emory Art History digital image collection and Open Emory.
Emory Core Metadata is a set of standards/schema-agnostic descriptive metadata elements that are required or recommended for all digital projects and collections. For more information, see the Core Metadata summary and individual element guidelines in our Descriptive Metadata Guidelines section.
The Metadata Working Group undertook a research-driven project in 2014 to identify core metadata based on an analysis of local and industry practices. We conducted a group task analysis, a survey of Emory metadata practitioners/stakeholders, a benchmarking comparison of 34 metadata standards, profiled major Emory discovery systems, and reviewed available web analytics for public facing systems. We then applied a scoring model to all data collected to surface a set of core metadata elements, for which we developed detailed usage guidelines. Usage guidelines were based upon local practices and system constraints, major content standards such as RDA and DACS, as well as the MODS Aquifer Guidelines and the Digital Library Federation's Best Practices for Shareable Metadata.
Our scope does not propose changes to traditional cataloging practices (e.g. MARC/RDA; archival arrangement and description); our guidelines are intended for digital collections’ metadata specifically. For units with established digital collections metadata, we encourage you to adopt these guidelines and make sure your projects/collections include these core information points, which can be mapped to major standards and local systems' schemas. For units without established metadata practices, these guidelines can be adopted for new work. The Emory Libraries' Digital Collection Development Policy (approved in 2014) requires that digital collections adhere to these guidelines as a baseline standard for descriptive metadata. These guidelines have been developed with review and final approval by the Emory University Libraries Cabinet.
From Wikipedia, a controlled vocabulary is:
“a carefully selected list of words and phrases, which are used to tag units of information (document or work) so that they may be more easily retrieved by a search”.
These terms are “chosen and organized by trained professionals (including librarians and information scientists) who possess expertise in the subject area. Controlled vocabulary terms can accurately describe what a given document is actually about, even if the terms themselves do not occur within the document's text”.
The controlled vocabularies "gather together variant terms and synonyms for concepts and to link concepts in a logical order or sort them into categories" (The Getty Research Institute). Terms from a controlled vocabulary help promote consistency in both describing and locating information.
Authority is another name for a controlled vocabulary. Authority control is the processing of maintaining and reconciling controlled terms within a system.
Examples of major authorities/controlled vocabularies include: Library of Congress Subject Headings, Medical Subject Headings (MeSH), and the Getty Art and Architecture Thesaurus. Emory projects and systems may also utilize smaller-scale, locally-created sets of controlled terms.
Search engine optimization techniques can help to enhance your content's visibility in both Google and Google Scholar. Google Scholar, however, has very specific requirements regarding types of content it will include, as well as metadata tag names, and markup. Please see our Google, Google Scholar, and Search Engine Optimization pages for additional information.
Among metadata practitioners, a schema (or scheme) can mean different things - but most commonly refers to a set of metadata elements with defined rules for usage and/or encoding.
Digital collections metadata are often encoded in XML, which has its own schema (XML Schema) for structuring and encoding elements.
In this website we refer to metadata element sets both as "schemas" and "standards". Some metadata schemas are published as formal standards and may be encoded in a variety of ways. Metadata schemas/element sets are often used in conjunction with a content standard, which provides rules for how to populate and format entries for each element.
We recommend that whenever possible, you use an established metadata standard appropriate to the content you are describing vs. creating a new metadata schema from scratch. Your choice of metadata standard will also be impacted by the system you will use to author and store your metadata in: systems often enforce specific metadata schemas.
Emory Core Metadata provides a set of elements that are broadly applicable and are mapped to major metadata standards. Commonly used standards at Emory include:
Style guidelines regarding punctuation and capitalization vary across content standards and publishing environments.
The Resource Description and Access (RDA) cataloging standard includes the following guidance for capitalization/case conventions (see Appendix A):
- Capitalize words according to the guidelines for the language involved. Record in lower case any words not covered by the guidelines in this appendix.
- Names: In general, capitalize the first word of each name. Capitalize other words by applying the guidelines at A.10-A.55, as applicable to the language involved.
- Titles: Capitalize the first word or the abbreviation of the first word in a title, or in a title of a part, section, or supplement (see 18.104.22.168). Capitalize other words within titles by applying the guidelines at A.10–A.55, as applicable to the language involved.
- Guidelines for English-language capitalization basically follow The Chicago Manual of Style.
The Describing Archives: A Content Standard (DACS) standard does not prescribe specific punctuation conventions, but recommends spelling out acronyms and avoiding abbreviations.
For other digital content publishing scenarios such as scholarly publications or websites, follow any specific style guide conventions for titles, etc. that may be provided by your publication outlet.
When reading metadata standards documentation, you may see references to all 3 terms. “URI” stands for Uniform Resource Identifier, a formal system for uniquely identifying resources. URIs consist of two types: URLs (Uniform Resource Locator) and URNs (Uniform Resource Name).
A URL is a type of URI that identifies a web resource as well as its network location and access protocol (e.g. http://www.emory.edu).
A URN is a formal naming scheme that identifies a resource, but does not indicate its location or how to access it (e.g. ISBN, ISSN).
Many metadata standards recommend the use of URIs in either format for providing values for certain types of elements, so it’s helpful to understand the differences. A URL is an actionable link that can retrieve a web-based resource; a URN is a unique identifier, but cannot be used to create a clickable link.
Metadata can be created in many different ways. At its simplest, it's often created in a spreadsheet, and then imported into a more robust system. Metadata records are often transformed to meet the needs of each system they are stored in.
Other ways of creating metadata include:
- Databases (which may also have custom web forms for data entry)
- XML encoding using a text editor or specialized XML Editor like Oxygen XML
- HTML encoding: adding metadata to web pages using an HTML editor or content management system such as Cascade Server or Wordpress
- Integrated Library Systems/cataloging tools like Ex Libris Aleph or Alma
- Digital Asset Management Systems such as LUNA, SharedShelf