“Really Strategies provides us with the third-party expertise we need.”
In each issue, this section presents an interview with someone in the field of content management. In this issue we interview David Wood from the Semantic Web Research Group inside the MIND LAB at the University of Maryland Institute for Advanced Computer Studies.
| Name: | David Wood |
| Affiliation: | The Semantic Web Research Group inside the MIND LAB at the University of Maryland Institute for Advanced Computer Studies |
| Title: | Entrepreneur in Residence |
| Brief Bio: | David co-chairs the World Wide Web Consortium's Semantic Web Best Practices & Deployment Working Group. David was also co-founder and CTO of Tucana Technologies, an innovative producer of Semantic Web technologies and primary sponsor of the Kowari Open Source project. |
[1] What makes a technology "semantic?"
David: A technology is semantic if it facilitates machine to machine communication; if it helps create machine readable data. Most
data is not machine readableit's free text. That's why there are things like Google that do full-text searching. With semantic
technologies, we tag the information about the data to a fuller extent so that machines can be programmed to manipulate the
data as though they understand it. Of course, they don't really understand the datathey are machinesbut the point is that
there is enough metadata that machines can be programmed to derive meaning from the data.
It's important to consider the difference between syntax and semantics. Syntax is structure; semantics is meaning. Semantic
technologies allow us to add meaning to our structure.
[2] Some people say the Semantic Web is already here and others say we have a long way to go. What do you feel is the current status of semantic technologies and where do you see it going?
David: The Semantic Web is not likely to be a mainstream fashion statement that passes some tipping point and is suddenly used
by your grandmother. Although the Web did that, most technologies don't. The Semantic Web is a quiet revolution that is
starting to invade the existing Web. In a few years we will look back and say "of course we use the Semantic Web. It's part
of the infrastructure."
The two most important movements are in standards and uptake (or adoption).
It's one thing to say "how to tag" but it's another to say "what do the tags mean" and that's where standards come into play.
Currently researchers are moving on from basic metadata descriptions to development of Web ontology languages, rules language,
shared ontologies and policy descriptions. Additional standards on top of RDF, XML, and Dublin Core, like OWL and rules languages
facilitate machine manipulation of data.
On the uptake side, the next steps are to shake out the best practices and use of common of tag sets. People will developand
are developingtheir own ontologies, but should make use of standards where they exist. At first many usually create too
much of their own description, but the hope is best practices will drive efforts so that everyone will use existing ontologies
wherever possible and that we can develop a library of ontologies. This way we can share amongst other systems and use data
in a non-trivial way.
[3] What is the benefit of adopting semantic technologies for publishers?
David: There are some high visibility publishers who have already adopted and implemented semantic technologies. For example, Nature
Publishing has done a great deal of work in news syndication. Elsevier has created the Drug Ontology Project. Major technology
vendors, like Cisco, Oracle, IBM, and HP have all made major Semantic Web announcements in the last few months. It's getting
more mainstream and the benefits are being realized. With semantic technologies, publishers can begin to leverage content
in new ways not done before.
Semantic technologies being widely adopted now, like RSS, really are pretty light in terms of their use of semantic technologies.
But the key is that a little semantics can go a long way and if you can understand the value in something as simple as RSS
it's easy to see how even more advanced use of semantic technologies can provide value.
[4] How would you compare RDF and Topic Maps? What are the key questions a publisher would need to ask when choosing between RDF or Topic Maps?
David: Both are clearly semantic technologies. There are subtle different philosophical drivers for both, and both provide solutions for very similar problems. There is ongoing work at the W3C to define and foster interoperability between the two. If a publisher is in need to implement now, choosing between them depends on what you need now and in what direction you are going, but to do it in such a way to keep the door open to see how the interoperability efforts turn out. Publishers should pay close attention to the ongoing work to marry the two, which will hopefully leverage the best aspects of both. Particularly, have a look at the W3C's RDF/Topic Maps Interoperability Task Force documents.
[5] How can I have fun with the Semantic Web?
David: To get your hands dirty with some real Semantic Web technologies, download and install the Longwell Browser and Piggybank, from MIT's Semantic Interoperability of Metadata and Information in unLike Environments (SIMILE) project. You may also
try the Haystack "universal information client" from the MIT Computer Science and Artificial Intelligence Laboratory.
The Longwell browser is an RDF browser. Piggybank is a Mozilla FireFox plug-in that allows you to apply your own RDF markup
to any Web site.