“Really Strategies provides us with the third-party expertise we need.”
So, you've made a business case to enable content reuse by converting all your publications to XML. You realize that in order to achieve this you need more than just well-formed XML, you need a DTD or schema to define the content structures. Great! Those are big steps and good decisions for many publishers.
You start writing that DTD and you begin trying to figure out how to accommodate all of the various content structures that exist in your publications.
For example
No problem. These variations (and more) are easily expressed in a DTD with a simple element declaration like the following:
<!ELEMENT Chapter (Title, (Author|Summary|Body)+) >
This declaration means that a Chapter must contain a Title, followed by Author, Summary, and Body elements in any order.
All right. You finish your DTD and you have all your books converted to XML and maybe all that XML data is stored in a content management system that makes it easy to find.
Now you're ready to start reusing that content. You see an opportunity to create a new publication by pulling together a half-dozen chapters from book A, a couple from books B and C, and a few newly authored chapters. You have the new content developed and you pull it together with all the existing chapters that you'll reuse, and you flow the content into a desktop publishing tool to format it.
But wait. The placement of the author names and chapter summaries in your new publication is inconsistent. In the chapters from book A the summary comes before the body of the chapter; those from book B have the summary after the body; and those from book C have both the summary and the authors after the body. You want all the chapters in the new book to look like book B.
Once again, there's a solution. This time in the form of XSLT (Extensible Stylesheet Language Transformations -- a language for transforming XML documents into different XML documents or other formats). XSLT can easily manipulate these content structures to reorder things so that all of the chapters in the new book look the same. The transformations are pretty simple stuff and can be invoked just before the content is flowed into your desktop publishing tool.
OK. Another problem bites the dust. Whenever you want to reuse content in a new pub you just write some XSLT scripts. Sounds pretty easy. But even in the simple example given here you'll have to write one transformation to convert the book A content to the new style and another for the book C content. Multiply that by x number of other potential transformations and consider that a new product might reuse content from 5, 10, or more different sources, and this can quickly become pretty unmanageable.
In fact, the problems resulting from your inconsistent content can often be much worse than illustrated in the example above. Maybe you plan to publish all your book chapters to the web. In this web site, you want to allow users to search for chapters by author name and to browse chapters by scrolling through their summaries. Problem is, the model you used to tag your content not only allowed Author, Summary, and Body in any order, it also allowed you to skip any two of the three elements. So, when you go to publish your chapters to the web, some of them might be completely missing Author or Summary elements.
A better approach would be to settle on one, consistent, more carefully controlled data structure for the XML for all your books. When a variant structure is desired in a particular print product, an XSLT script can be used to create it. This makes it much easier to know what to expect when sharing content across publications and also places the costs of the variations with the product, making it easier to do a cost/benefit analysis of continuing to support these alternate output formats.
For example, a better model for our example chapters might be:
<!ELEMENT Chapter (Title, Author+, Summary, Body) >
In this model, at least one Author must be included, and a single Summary is required.
So, how do you settle on one, consistent data structure for the XML for all your publications? In relational database design the first step is to normalize the data: to analyze it and identify and eliminate the inconsistencies. These same principals can be applied to modeling XML content structures. Analyze the patterns of data structures used in your publications. See what the most common patterns are and make decisions about what will be supported and what should be standardized.
Some of the key steps in this process are:
Some of the benefits of standardized XML data structures are:
Modeling your XML data to accommodate all of the variations found in your published content, although relatively easy to do, can be a costly mistake - especially if one of your goals is to reuse or repurpose content. Some of these content anomalies are intended; some may not be. But if you take a good hard look, you might find that even the intended variations in content structures, designed to differentiate print products and make them more interesting, have now become obstacles to reusing and activating your content in the digital world.
The bottom line: XML as an enabling technology can transform publishing processes. But not if you use the new technology to re-create the status quo.
Consultants and analysts blog about strategy, content, and XML.
See what they are saying.