“Really Strategies provides us with the third-party expertise we need.”
This article provides a brief overview of some of the more relevant standards related to publishing activities and content technologies. This is certainly not inclusive of all standards, but briefly covers the standards we frequently encounter that are of particular interest to publishers.
Also note that this article does not attempt to make any distinction between standards, guidelines, recommendations, or a number of other terms that can be used to convey a set of principles that some organized (or pseudo-organized) body is promoting for industry-wide adoption.
There are a number of standards that provide metadata vocabularies as well as those that provide syntax or markup related to metadata and vocabularies. What is the difference between the two? A vocabulary provides a list of standard definitions or values. A syntax defines the XML markup for capturing metadata values.
Dublin Core. The granddaddy of metadata standard vocabularies, Dublin Core's success has been its simplicity. It started out with 15 basic fields (such as creator and date). There was no direction in syntax but more importantly it just laid out what the fifteen fields are named and what information they should capture. Dublin Core has since expanded on its original fields. The working group participants continue to refine the metadata conventions based on research and feedback between the various Dublin Core Metadata Initiative Working Groups.
PRISM. Maintained by IDEAlliance, the Publishing Requirements for Industry Standard Metadata (PRISM) builds on the Dublin Core standard but expands into publishing specific metadata (such as coverDisplayDate and edition). PRISM was initially set out for magazine publishing but it can provide value to other types of publishing as well.
DOI. The Document Object Identifier (DOI) standard provides a mechanism for assigning constant and unique identifiers for digital objects. Although the URL of the object or other information about the object may change, the DOI remains a constsant pointer to that object and information about it.
Medical vocabularies. There are a number of medical vocabularies out there. Medical Subject Headings (MeSH) is the National Library of Medicine's controlled vocabulary thesaurus. MeSH dates back to 1960 and was used to index articles from leading medical and biomedical journals. The descriptor terms are arranged both in an alphabetical and a hierarchical structure. RxNorm, also produced by the National Library of Medicine, is a standardized nomenclature for clinical drugs, detailing active ingredients, strength, and dose forms. The Systematized Nomenclature of Medicine--Clinical Terms (SNOMED-CT) is produced by the College of American Pathologists and provides a comprehensive vocabulary of clinical terms.
OWL. OWL (Web Ontology Language) is the W3C effort to provide a standard for the types of relationships that can be expressed in RDF. OWL provides for an XML framework to express hierarchies, relationships, and full ontologies. You will often see reference to OWL Lite, which is a subset of the full OWL standard.
RDF. The W3C's Resource Description Framework (RDF) is not a vocabulary like the others in this list, but an XML syntax for expressing metadata and relationships. The PRISM metadata standard can be expressed in RDF, many RSS feeds use RDF syntax, and Adobe's XMP (eXtensible Metadata Platform) for embedding metadata within media objects makes use of RDF.
XMP. XMP (eXtensible Metadata Platform) is for storing metadata in media objects, such as PDFs and images (although it can be used for text-based files, such as HTML). It was produced and is maintained by Adobe and uses RDF syntax.
ONIX. ONIX provides a framework for storing and transmiting information about a publication with a focus on data related to the sales of the publication, such as pricing and availability. It was first developed for the book publishing industry but has also offers a version for serials.
Well, this is broad. There are probably hundreds of different XML models out there for structuring content. Below are some models frequently used by publishers.
Book Content. DocBook is the venerable DTD for basic text book structures, though its intention was originally for technical books. Note that the Darwin Information Typing Architecture (DITA) originally an IBM creation and now an OASIS supported standard, is also targeted for technical book content. Where DocBook focuses on the structure of a single book, DITA's goal is for the re-use of content amongst a collection of books. Lastly, Open E-Book is a standard related to e-books. There are many others that fall under this category.
Medical Content. Many medial publishers have their own format, but some also use the National Library of Medicine's (NLM) Journal Publishing DTD. Additionally, the National Center for Biotechnology Information (NCBI)the NLM branch that created the DTDis promoting its use for Pubmed Central, the NLM's "digital archive of life sciences journal literature."
News Content. The two big standards are NewsML and the News Industry Text Format (NITF), both from International Press Telecommunications Council (IPTC). NITF covers the structure of a textual news article, whereas NewsML is a broader standard that encompasses different types of news items and data exchanges.
Tables. There are three standards commonly used for encoding tables: the CALS Table Model, the XML Exchange Table Model, and HTML tables. CALS is an SGML standard developed by the U.S. military which included markup structures for tables. There is now an SGML and an XML version of the CALS table model. The XML Exchange Table Model is a subset of the CALS model and is maintained by OASIS. HTML tables follow standard HTML markup.
Graphics. Scalable Vector Graphics (SVG) allows for the dynamic creation of scalable graphics. It is a W3C specification.
MathML. A W3C specification for representing mathematical expressions in XML.
Standards work in digital rights management as it pertains to publishers is relatively new compared to some other standards listed here.
NISO. NISO is at the beginning stages of developing a standard to support digital rights expression and management for scholarly and educational information.
ODRL. The Open Digital Rights Language (ODRL) is an XML schema and vocabulary for expressing rights information for digital content, including permissions, constraints, requirements, and conditions for access.
PRISM Rights Language (PRL). An extension of elements in the basic PRISM specification, the PRISM Rights Language (PRL) is intended to describe how content can be used and by whom. It is a small and simple set meant to address the most common rights and permission concerns, but also to be extended to use with other languages and elements.
XACML. An OASIS standard, the eXtensible Access Control Markup Language (XACML) is used to represent and evaluate access control policies in an XML format. It provides fine grained control of authorized activities and mechanisms for creating rules and policies.
XrML. XrML (eXtensible rights Markup Language) provides a method for specifying and managing rights and conditions associated with all kinds of resources including digital content as well as services.
There are a number of standards to address improving access to electronic information for people with disabilities.
Section 508. This term refers to a 1998 amendment to the Rehabilitation Act that requires Federal agencies to make electronic information and technology accessible to people with disabilities. It is a law for federal agencies but publishers who deal with the government or who want to extend their offerings to people with disabilities will want to follow the guidelines.
W3C. The w3C issued its own recommendations for developing web sites for maximum accessibility levels. The recommendations not only help make information accessible to those with disabilities but also make for good common sense web site design. The popular "Bobby" tool (now replaced by WebXACT) checks against the W3C recommendation.
Others. The two previous standards are the most frequently referenced accessibility standards and followed by those who are concerned with accessibility issues. There are numerous other organizations and consortiums dealing with accessibility, such as www.daisy.org, which offers guidelines for making audio books more accessible. For other information see http://www.a4access.org/accessibility.html and http://tribalcms.com.
Publishers selling to the library market typically need to meet certain requirements for providing usage data to customers.
ICOLC. The International Coalition of Library Consortia (ICOLC) has a set of guidelines for capturing online usage data and privacy and confidentiality concerns.
COUNTER. COUNTER (Counting Online Usage of Networked Electronic Resources) covers the capture and exchange of online usage data and focuses on journals and databases.
It's also worth mentioning a few other standards you'll come across if communicating with the library community, such as Z39.50 (a protocol for exchanging bibliographic data) and OpenURL (a protocol for linking to resources from bibliographic citations).
There are a number of standards bodies and organizations that coordinate the development and promotion of standards. We've left out mention of industry specific organizations but those listed below offer standards related to publishing technologies (and manage some of the ones listed above).
IDEAlliance. IDEAlliance provides specifications and best practices for publishing and content-driven organizations. Its set of standards covers a range of publishing related activities, including content management, production workflow, and postal distribution.
ISO. The International Standards Organization (ISO) publishes thousands of industry standards, (e.g., electrical engineering, ceramics, food technology, and many others). ISO is a good place to check industry specific information (if for example, you publish information for the transportation industry) and are the keepers of specific technical specifications, like character encodings and file formats, as well as standards for expressing geographic locations, languages, and date/time formats.
NISO. The National Information Standards Organization (NISO), accredited by the American National Standards Institute (ANSI), develops technical standards related information management and exchange. OpenURL is a NISO standard (Z39.88).
OASIS. OASIS started with the Doc Book DTD but has expanded into many other XML standards, focusing primarily on e-business standards.
W3C. Headed by Tim Berners-Lee, the W3C offers specifications and recommendations on a wide range of web-related technologies including XML, XQuery, XPath, XSLT, etc.
Consultants and analysts blog about strategy, content, and XML.
See what they are saying.