“Really Strategies provides us with the third-party expertise we need.”
Although the concept of the Semantic Web began to surface in the early part of this century (see this early Scientific American article from Tim Berners-Lee, James Hendler, and Ora Lassila), it is now reaching buzzword status. Some publishers have been using technologies considered to be "semantic" for a few years, but it is a new topic for many others.
What do the concepts behind the Semantic Web mean for publishers? Well, first let's askwhat is the Semantic Web? The point of the Semantic Web is a richer interconnectiveness among all objects (or content), thereby allowing us to pull data from various sources to discover new meaning and present in different formats. A simpler view is that the Semantic Web makes better use of metadata. That is, all objects on the web are assigned rich data to describe themselves (in a universal and standardized format), and tools are better able to make use of that data.
Almost all publishers use metadata in some capacity. Most also use taxonomies (a hierarchy of terms used to categorize content), although they might not call them by that name. The next step beyond that is the use of ontologies. Just as taxonomies make metadata or controlled vocabularies look "flat," ontologies do the same to taxonomies. Ontologies describe more detailed relationships among concepts and provide a higher level of richness in the metadata.
Taxonomies are just like the animal and plant kingdom taxonomies, in which every species lives in a particular branch. However other, more conceptual objects don't always have that clear lineage. If we created a taxonomy based on colors with the three primary colorsred, yellow, and blueas the top nodes, orange would need to be related to both red and yellow. In a simple taxonomy, we would probably repeat the term "orange" under both, but in a technical sense they would really be two distinct nodes that have the same name.
In an ontology, orange can be represented as the exact same concept appearing in multiple nodes on the tree. In fact, an ontology is not a tree at all. It is a complex mapping of concepts with defined relationships between those concepts (such as "part of" or "subclass of").
In their most expanded use, ontologies can in themselves be valuable collections of information and almost become database-like in nature. Imagine an ontology that captures court "metadata" for a legal publisher. That publisher may currently have a taxonomy with branches for federal courts, district courts, state courts, etc. But in this "flat" taxonomy, there is probably no implicit relationship between the local and district courts or state courts or to geographical boundaries like state or congressional district lines. In an ontology those relationships can be established. Of course, documents are still tagged to nodes in the ontology, but even without the documents, the ontology becomes a very valuable piece of content.
The W3C standard framework for expressing metadata (including taxonomies and ontologies) is RDF (Resource Description Framework).
RDF provides a standard framework for expressing information about resources (metadata) that allows for complex definition of relationships, polyheirarchal taxonomies (giving a node multiple parents), and the ability to combine taxonomies (by connecting a detailed taxonomy to a broader taxonomy through a common node). The purpose of RDF is to create a syntax to capture rich metadata and relationships and allow the processing of this data by applications.
The RDF data model expresses relationships among resources in what is called "triples." These triples define two things and the relationship between them. Each triple consists of a subject, a predicate and an object (sometimes called the resource, property, and value). The subject (or resource) is the "thing" the statement is about, the predicate (or property) specifies a characteristic or property of the subject, and the object (or value) is the value of that characteristic or property.
The following illustration is an RDF graph representing a triple that illustrates the simple metadata value of the author
of this newsletter article:
Where RDF gets interesting is when you start to combine triples, such as making the author the subject of another triple describing his email address or company affiliation.
The PRISM metadata standard can often be expressed in RDF, many RSS feeds use RDF syntax, and Adobe's XMP (eXtensible Metadata Platform) for embedding metadata within media objects makes use of RDF.
But being a structured framework, RDF is more syntax (structure) than semantics (meaning). OWL (Web Ontology Language) is the W3C effort to provide a standard for the types of relationships that can be expressed in RDF. OWL provides for an XML vocabulary to express hierarchies and relationships. OWL introduces specific property vocabularies, such as "sameAs" and "intertsectionOf." OWL provides a shared meaning in the RDF syntax.
In semantic circles, there is often discussion about RDF vs. topic maps. In most conceptual ways topic maps are very similar to RDF with some slight and subtle distinctions. Both have different origins. Whereas RDF came through the W3C, topic maps are an ISO standard and arose to address the need to create indexes (like back of the book indexes). Topic maps prime focus is on the topics (or subjects); RDF focuses on the resources. Although both were created for somewhat different purposes, both do very similar things.
Topic maps describe topic structures and associate them with resources. Like RDF, topic maps break from the traditional hierarchal taxonomy and offer much more robust classification, indexing, and relationship descriptions. Topic maps allow for the creation of complex topical descriptions which then point out to resources. There is a separation between the topical information (the index) and the content which is associated to specific topics within it.
The topic maps "language" uses topics, occurrences, and associations in its model where the topic is the resource (the thing or the subject), the occurrence is the resource that has some association with the topic, and the association is a type of relationship. You can see from a very high level the similarities with the RDF model. Note in topic maps the association is two way, that is if my topic is this article, the association is "is authored by" and the occurrence is "Ed Stevenson," the inverse is also true - that is Ed Stevenson (topic) authored (association) this article (occurrence).
It is beyond this introductory article to fully explore the differences between the two and much work has been done in that area. Additionally, the W3C has started a RDF/Topic Maps Interoperability Task Force to look for interoperability between the two. See http://www.w3.org/2001/sw/BestPractices/RDFTM/ for more information.
So if you never knew about the Semantic Web but now have the overview, what should you do next? It can be difficult to take the intellectual concepts behind the Semantic Web and apply them to practical day-to-day use in a publishing process. But it is important to be aware of the issues and the potential they have. The following are a few suggestions on preparing your publishing organization for the Semantic Web.
In addition to the links found within the document, the following offer good information on the Semantic Web (and were consulted for this article):