Tech Ads
Back to Article List
Originally published August 2008 [ Publisher Link ]
SPARQL RDF's query language
Many resources now make use of the Resource Description Framework (RDF), a standard developed by the W3C to give more meaning to published information. It is one of the foundations to what many call the "Semantic Web." The purpose behind RDF is simple, let people describe the essence of resources using a syntax more powerful than standalone keywords, for the benefit of machine analysis. In doing so, RDF creates a need of another nature, how to search and access such statements. Next, you will learn about another standard developed by the W3C that aids in such a process: SPARQL - Query Language for RDF.
In order to understand what SPARQL solves, its necessary to take a sidestep into RDF's syntax. RDF operates on the premise of triples, expressions composed of a subject, predicate and object, that in turn provide more context to a resource. This is the reason why RDF is often considered metadata markup language. To support this metadata, RDF promotes the use of vocabularies to enforce pre-defined structures that can cover any topic imaginable. One such vocabulary is VCard RDF used to represent personal contacts. Listing 1.1 shows a VCard RDF structure with a supplemental and custom vocabulary named peopleInfo
.
Listing 1.1 - VCard RDF sample |
<?xml version='1.0' encoding='UTF-8'?> <!DOCTYPE rdf:RDF [ <!ENTITY xsd 'http://www.w3.org/2001/XMLSchema#'> ]> <rdf:RDF xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#' xmlns:vCard='http://www.w3.org/2001/vcard-rdf/3.0#' xmlns:info='http://www.example.com/peopleInfo#' xmlns:xsd='&xsd;' > <rdf:Description rdf:about="http://acmecorp/JohnDoe/"> <vCard:FN>John Doe</vCard:FN> <info:seniority rdf:datatype="&xsd;integer">12</info:seniority> <vCard:N rdf:parseType="Resource"> <vCard:Family>Doe</vCard:Family> <vCard:Given>John</vCard:Given> </vCard:N> <vCard:TITLE>Chief Executive Officer</vCard:TITLE> <vCard:ROLE>Senior Executive</vCard:ROLE> </rdf:Description> <rdf:Description rdf:about="http://acmecorp/SallyJohnson/"> <vCard:FN>Sally Johnson</vCard:FN> <info:seniority rdf:datatype="&xsd;integer">10</info:seniority> <vCard:N rdf:parseType="Resource"> <vCard:Family>Johnson</vCard:Family> <vCard:Given>Sally</vCard:Given> </vCard:N> <vCard:TITLE>Marketing VP</vCard:TITLE> <vCard:ROLE>Marketing</vCard:ROLE> </rdf:Description> <rdf:Description rdf:about="http://acmecorp/PeterErlang/"> <vCard:FN>Peter Erlang</vCard:FN> <info:seniority rdf:datatype="&xsd;integer">5</info:seniority> <vCard:N rdf:parseType="Resource"> <vCard:Family>Erlang</vCard:Family> <vCard:Given>Peter</vCard:Given> </vCard:N> <vCard:TITLE>Programmer</vCard:TITLE> <vCard:ROLE>Technology</vCard:ROLE> </rdf:Description> <rdf:Description rdf:about="http://acmecorp/FrankTarrington/"> <vCard:FN>Frank Tarrington</vCard:FN> <info:seniority rdf:datatype="&xsd;integer">8</info:seniority> <vCard:N rdf:parseType="Resource"> <vCard:Family>Tarrington</vCard:Family> <vCard:Given>Frank</vCard:Given> </vCard:N> <vCard:TITLE>Sales Rep</vCard:TITLE> <vCard:ROLE>Sales</vCard:ROLE> </rdf:Description> </rdf:RDF> |
What an RDF structure like the one illustrated in listing 1.1 achieves is the capability to publish machine consumable -- or Semantic Web -- resources. This same approach can be extrapolated to vocabularies covering Web documents, exposing resource metadata like author, publication date and copyrights, to more exotic vocabularies used for exposing metadata on things like flights, movies or music.
RDF documents are interesting by the mere fact that they expose the essence of a resource, without the need to use any sophisticated data mining algorithms. However, the more interesting question is how to search for information in an RDF structure? With RDF being a markup language, the simple answer would be using the same tools used to search XML or HTML structures -- like DOM or SAX -- but the reality is that relationships in an RDF ontology can become quite complex to rely on the same tools. Stepping into fill this void is SPARQL.
SPARQL has an SQL syntax like the one used to perform queries against relational databases, supporting qualifiers like ORDER BY, DISTINCT and LIMIT, therefore granting the same query power to perform searches on markup language structures. That said, let's analyze a few SPARQL queries used to perform searches on an RDF structure like the one in listing 1.1. Listing 1.2 contains such queries and the corresponding results.
Listing 1.2 - SPARQL queries and results |
<!-- SPARQL Sample 1 --> SELECT ?Homepage WHERE { ?Homepage <http://www.w3.org/2001/vcard-rdf/3.0#FN> "John Doe" } ------------------------------ | Homepage | ============================== | <http://acmecorp/JohnDoe/> | ------------------------------ <!-- SPARQL Sample 2 --> PREFIX info: <http://www.example.com/peopleInfo#> SELECT ?Employee WHERE { ?Employee info:seniority ?seniority . FILTER (?seniority >= 10) } ----------------------------------- | Employee | =================================== | <http://acmecorp/SallyJohnson/> | | <http://acmecorp/JohnDoe/> | ----------------------------------- <!-- SPARQL Sample 3 --> PREFIX info: <http://www.example.com/peopleInfo#> PREFIX vcard: <http://www.w3.org/2001/vcard-rdf/3.0#> SELECT ?Name ?Seniority ?Title WHERE { ?person vcard:FN ?Name . ?person info:seniority ?Seniority . ?person vcard:TITLE ?Title . } -------------------------------------------------------------- | Name | Seniority | Title | ============================================================== | "Frank Tarrington" | 8 | "Sales Rep" | | "Peter Erlang" | 5 | "Programmer" | | "Sally Johnson" | 10 | "Marketing VP" | | "John Doe" | 12 | "Chief Executive Officer" | -------------------------------------------------------------- |
The first SPARQL query is as simple as it gets, indicating a query be performed on the VCard RDF namespace for an #FN
value of "John Doe"
, with the ?Homepage
value representing a variable to which the corresponding result is assigned. The output for this particular query points toward the rdf:about
value which is the contact's main page.
The second SPARQL query uses the PREFIX
value to assign an RDF namespace to a variable that is later used inside the query. This particular query uses the supplemental VCard RDF value of seniority
in the http://www.example.com/peopleInfo
namespace, as well as, SPARQL's FILTER
qualifier. The results for this query return all the VCard RDF contacts with a seniority greater than 10, with the results also pointing toward the matching contact's rdf:about
value. The last SPARQL query simply performs a search on both namespaces included in the VCard RDF structure, and outputs every contact's name, seniority and title.
In order to execute SPARQL queries, you can rely on a tool like Jena -- a Semantic Web framework written in Java that includes a SPARQL query engine. Jena's SPARQL query engine will allow you to input both an RDF structure and SPARQL statement, and output query results to either a standalone console or integrate them as part of a greater Java application.
On related issues, SPARQL also defines its own SPARQL Protocol to convey SPARQL queries via Web services, and a corresponding SPARQL Query Results XML Format used to represent query results performed against RDF structures.
Additionally, it is also worth mentioning that SPARQL queries are often performed against an abbreviated form of RDF named Turtle - Terse RDF Triple Language , which is nothing more than a compact version RDF, though harder to read by humans, it is a more efficient syntax for machine processing.
Though RDF and SPARQL aren't mainstream compared to other markup and query languages, they both set an important precedent for organizations wanting to make their content Semantic Web friendly, and with it facilitate the creation of custom data mining applications. Keyword data mining via search engines and their supplemental Web services will never go out of style because they are simply too easy to use, but it's important to realize major search engines can take months, years or often times never extract relevant meaning on certain content. By using RDF, a resource's meaning can improve dramatically aiding in any data mining efforts, and with SPARQL providing the foundations to search such structures, any organization is capable of creating more intelligent data mining applications leveraging both technologies.