“Really Strategies provides us with the third-party expertise we need.”
Many articles that discuss metadata start off with the proverbial "what is metadata" question and generally cite the "data about data" definition. Some articles also explore the metaphysical questions of the difference between metadata and the data itself. I've even seen reference to meta metadata.
This article doesn't go there, but rather looks at how a publishing organization should go about establishing a formal metadata plan, and why. Getting beyond the semantic and philosophical discussions, we all know metadata is important. It is the currency of content and the behind-the-scenes driver enabling all the cool buzzwords like findability, re-use, and monetization.
At the XML 2007 conference in Boston last December, a room full of publishers was asked how many have a metadata strategy. Only a few hands were raised. But when asked how many people are in the process of creating a metadata strategy, most of the hands shot up. Publishers are realizing the importance of getting the metadata process in order. This article provides some direction for doing so.
Even though many organizations appreciate the value of metadata, there is still no clear, centralized, and full understanding of the creation, storage, and use of metadata across its content and other assets.
For many traditional print publishers, it is not uncommon for metadata (and sometimes identical or similar metadata) to be applied to the same content in different workflows, by different people, and in different systems. For example, the content may be associated with a specific section or column in print. But on its way to the web, it is categorized in a different taxonomy in the Web CMS. Somewhere else a corporate librarian enters the content into yet another system and applies index terms to it. Some of this metadata may be duplicative and much of it is not available outside its own system. There may be good reasons for this type of process, but it is important to take a look to understand why and determine whether the work could be streamlined and the metadata made accessible outside its individual silos.
Performing a metadata audit and then using that to put a metadata plan in place helps an organization get a better understanding of the metadata processes they currently have across the organization and what they need to do make improvements, such as standardization of metadata terms across content types, removal of duplicative process, and better bridges between systems.
The following sections describe steps publishers can take to build a centralized metadata plan.
The first step in any metadata plan is to first identify what you already have. An important aspect of this process is to not just simply make a list of metadata names but also to understand the purpose of each. Each metadata item should have a clear definition of what it is trying to capture as well as an understanding of what external systems and processes rely or process this metadata. For example, take the ubiquitous "author" metadata (such as Dublin Core's "creator" element). This may be displayed on the web site with the content, used to create a dynamic index, delivered in feeds to aggregators, and used for searching on an intranet. It is important to understand what applications rely on the metadata so that the implications of any future changes are understood.
Another important aspect of any metadata plan is to understand the people, processes, and systems that interact with metadata assignment. This includes understanding who or what is applying metadata to the content, where in the workflow does this happen, and who "owns" the metadata.

As mentioned previously, in many publishing environments there is a divergence in the print and web delivery channels. And as content branches out to each media, it enters different workflows, during which metadata is applied. It's true that some of the metadata is media-specific, such as print sections or online navigation categories, but an organization may be able to leverage this same metadata elsewhere (the web site navigation may be topical and those topics could be helpful to aggregators) or even to unify it (a single topical categorization might be able to automatically support both). You won't usually get around the fact that metadata is applied by different people at different points in the workflow, but understanding how it is done now and making decisions on how you would like it to work will help reduce redundancy and possible enable reuse of metadata for additional applications.
To continue the example above, the metadata assigned for web site navigation may be too tightly tied to the current design of the site. Instead, consider capturing a more abstract "aboutness" of the content, something that could be mapped to the site map, but also reused for other purposes and something that is more flexible when the site map and organization changes.
This is also a good time to identify neglected or abused metadata. Too often, the metadata fields deployed with any CMS or similar system are neglected or used for the wrong purposes over time in day-to-day use of the system. Although care may have been taken in the initial deployment of metadata proceses, months later you often find that fields are not being populated or there is inconsistency amongst users on how they are used. If this is happening, the value of the metadata is significantly reduced.
Some other questions to ask:
Often metadata is stored in many locations. Just about every system or product your content touches has some capacity to store metadata. Content creation tools, such as Microsoft Word, allow you to apply some simple metadata to your documents. Production management systems such as Softcare's K4 provide the capacity to capture and store metadata about your documents. So do digital asset management systems. And content management systems. And web content management systems. You get the point, and you already realize that sometimes one piece of content can live in a number of these environments.
Additionally, metadata can be found in the body of the file itself (title and author are often in the text), a header section of the file (in HTML and XML), or in a binary file (XMP, EXIF).
Storing metadata within the content itself, as opposed to systems that handle the content, poses potential benefits in terms of preserving and transferring that metadata as the content travels through different systems. If the metadata is attached or within the file, it can be easily transferred when the file touches another system (assuming the other system can read it). However there are still questions on how metadata such as controlled vocabularies can be applied and edited in this scenario given that there still needs to be a central authority list of terms outside the file.
The issue of where controlled vocabularies of metadata are stored and updated is a big one. Perhaps you have images in a DAM and textual content in a CMS. You might also use the same taxonomy to categorize both types of assets, but because the assets are in two different systems, the taxonomy may be maintained in two locations. Can a photo editor update the taxonomy in the DAM with no one synching up the changes in the CMS? Not being aware of this risk, or not having a plan in place to make sure the taxonomies remained synchronized in different systems, could limit your opportunities in tying together different content types based on metadata rules.
Multiple systems, varying formats, and different storage mechanisms can make for a messy metadata picture. There are some limitations in the tool sets and systems we need to use that don't help, but your organization should have an understanding of the current processes your metadata takes.
A metadata plan itself cannot tackle a desire to unify these systems, but rather needs to accept that metadata can be dispersed through the organization and at least identify where it is, how it is applied, and its potential uses.
Once the audit has been done, you have a better sense of your current metadata environment and can move forward in determining what gaps exist, where processes can be streamlined, and develop plans for future maintenance and evolution of your metadata activities.
Some issues to consider include the following:
You will most likely find that ownership and responsibility for metadata are dispersed through the organization, but each owner has a narrow focus on his or her immediate needs. It is unrealistic to expect every person in the metadata process to fully understand or appreciate the needs beyond their application. They have a job to do. But it would be in the best interest of an organization to assign centralized responsibility and ownership of metadata to some individual (or group, depending on how big this is). There should be someone within the organization who can serve the role of the metadata czar and has an understanding of the metadata processes, a direction with metadata strategy, and authority over the maintenance and changes in metadata applications.
This person would have the global interests of the organization in mind. Changes in the metadata set should be filtered through this person. This person would need to work with IT, editorial, and business for making changes to the metadata set and processes.
Metadata is most often driven by business needs. Publishers and content creators want to do more with their content, and these activities are often driven by underlying metadata. Whoever is in charge of your ongoing metadata plan needs to be in tune to these business drivers and take those needs into consideration. With an audit in place, you can also perform a gap analysis between what metadata you have now and what you need to do to support future business requirements. And if you're lucky, the audit may also help expose silo-ed metadata (such as in that Web CMS) that could be leveraged for new business opportunities.
Turning to industry standards is a smart move in establishing metadata. Using an industry standard leverages the work and thought processes done by other people on what is important in a specific metadata area. For example, Dublin Core is widely used as the base of many metadata plans. The PRISM metadata standard, which builds on Dublin Core, provides a set of metadata fields with a primary focus on magazine and serial publishers. The NLM DTD includes metadata fields appropriate for use by journal publishers.
To be successful there needs to be agreement and buy-in for the metadata plan, which includes agreement on the metadata fields and terms used throughout the organization. You'll want to have consistency across groups on how metadata terms are used and applied.
The metadata picture is not static. Metadata will need to change over time, whether it is adding new metadata, making new uses of existing metadata, or changing workflows involving metadata. But putting a centralized plan in place with clear ownership and authority for changes can help smooth the process.
A strategy brings uniformity across the organization and consistency and quality to the metadata, so it not only serves its originally intended purpose but has the potential to be reused in other contexts. A formalized metadata plan can bring the following benefits:
Consultants and analysts blog about strategy, content, and XML.
See what they are saying.