This demo shows a set of queries against mostly social semantic web data from the billion triples challenge data set.

The online demo has relatively fast running queries but also long running analytics queries have been tested on the data set.

The blog post here shows some more possible queries.

The technical message is about showing SPARQL extensions for text search, aggregation and subqueries and dealing with transitive properties and traversing trees and graphs.

The implication is that many tasks that previously,, using relational databases required custom application logic and task specific database design can now be undertaken within a general purpose database system without application specific schema or procedural logic. The data is loaded as it comes as RDF and is ready for querying the moment it is loaded, there is no special Extract transform load (ETL) logic involved. If special materialization of intermediate query results is desired, this too can be done, with SQL as well as SPARQL. An example of this is statistics on tag or interest co-occurrence.

The scalability implication is that it is possible to flexibly deal with large volumes of RDF data on low-cost hardware. The system used in the demo has a list price of only about $8000, two commodity servers. More servers can be added if a higher ratio of RAM too data is desired. All the demo queries benefit from in-query parallelism and partitioning. With more data, the number of partitions can be increased without loss of performance.

The web user interface is a lightweight wrapper written around the queries. This is an example of traditional web front ends against a new type of database. The interface is written in Virtuoso's VSP dynamic web page language but could just as well be done in PHP or any other.

Back to main menu