“Really Strategies provides us with the third-party expertise we need.”
Part 1: Translating XPress documents to XML
XML and Quark XPress are integral parts of many publishers' workflows, but they just don't play nicely together yet.
Let's dream a little
In the ideal world, users could choose a "Save As XML" function from within XPress and have the desired XML file(s) show up wherever the user wanted them to. They could also do the equivalent in batch mode (working with many files at a time). If the world was really perfect, this would be possible even with old and inconsistently styled XPress documents.
Wake up!
There are methods for achieving these dreams in some contexts and with some content, but it's typically more work than you'd expect before having studied the issues. The problem can't fairly be blamed on Quark XPress, although there's certainly room for improvements in XPress itself. At its heart, the problem is that the purpose of using XPress (or any other page design tool) is different from the purpose of using XML. There are many nuances to this difference, but it boils down to the fact that XPress documents are designed to be interpreted by human beings who are capable of inferring meaning and relationships from the content, placement, and design of text and graphics on a page. Compare that to XML, which is designed to make meanings and relationships explicit so they can be processed by XML-aware software.
When translating XPress documents to XML, publishers quickly realize that XML tagging structures are tree-like and linear, while designed documents are certainly not tree-like and are only somewhat linear. It is often difficult and not necessarily meaningful to define an exact order for the elements on a page, but it's required for XML. On a formatted page, the fact that a sidebar is next to a main article and has similar content is sufficient for a person to identify their relationship to each other. This is not true for most software, and so conversion becomes painful. The problem is even more extreme when graphical content is involved.
A second major problem, especially with older XPress files, is that XPress users have been more worried about how their pages look than about consistent usage of style sheets, which reduces a programmer's ability to write reliable conversion routines based on XPress style sheet names.
It would certainly be possible to mitigate these issues through changes to XPress itself, and the next version promises to do at least some of that. Meanwhile, here are some more details to help define and tackle this problem as it exists for you today. As you'll see, there is no one right approach—it depends on your content and workflow. Adjustments in each of these areas can sometimes help to achieve your overall goal.
Some approaches
These are the most widely used XPress-to-XML translation methods:
The first two options can be augmented by the use of a separate product or custom script that enables files to be manipulated in batch (so entire directories of XPress documents can be converted, for example).
And, of course, some publishers realize that they can avoid the problem all together by using a different desktop publishing tool like FrameMaker that is more XML aware and also acceptable as a layout tool for their products.
Contact factors
| How much content | If you have a small amount of content, consider manual re-creation. If you have very large volumes of content, consider talking to vendors with proven processes for conversion via OCR or similar methods that don't depend on the XPress markup. Even the methods using relatively unsophisticated software can be affordable if performed by an experienced vendor. |
| How much variability in the content and its layout | High variability content is a problem for all the approaches. |
| Layout, especially the use of sidebars, figures, tables and other content that is not inline with the primary text flow | The scripted approach cannot typically account for layouts that include unpredictable non-linear elements, and so requires
manual post-processing. (If, for example, you always have your sidebars in the same location, this might not matter.) The other approaches typically require people to identify the relationships among non-linear objects either before the content is converted to XML (products like Atomik). |
| Page jumps | If content flows from page to page through linked text boxes, then most approaches will be able to maintain the continuity of those text flows without human involvement. This may not be the case for approaches that involve scanning. |
| Use of complex tables | XPress (today) has weak table handling, and there is little that can be done programmatically to address this issue for complex tables (tables with spanned rows and columns, for example). |
| Use of style sheets | Inconsistent use of style sheets makes programmatic conversion based on xpresstags or the use of a product much more difficult than otherwise. |
| Complexity and specificity of target DTD | A complex target DTD (e.g., where different sections require different tags sets even though the content formatting is identical), is difficult for any automated approach. Human intervention might be needed. |
| Presence in the XPress files of all the needed content | Sometimes XML content (e.g., attribute values, an image dimensions) needs to be derived from the XPress content. This typically requires programmatic or human effort. |
Workflow factors
The effect of workflow factors tends to be more complicated to evaluate. Here are some questions to ask yourself:
Choosing
When weighing the options, be sure get help from people with experience using your preferred approach, or who can help you choose an approach in the first place. XPress-to-XML translation is full of tiny pitfalls that are easy to avoid once you've been through the process.
Stay tuned to upcoming issues when we look at importing XML documents into XPress.
Consultants and analysts blog about strategy, content, and XML.
See what they are saying.