Implementing SPARQL query processor on top of Big SQL Engine

Abstract

RDF (Resource Description Framework) is the main ingredient and the data representation format of Linked Data and Semantic Web. It supports a generic graph-based data model and data representation format for describing things, including their relationships with other things. In practice, the SPARQL query language has been recommended by the W3C as the standard language for querying RDF data. The size of RDF databases is growing fast, thus RDF query processing engines must to be able to deal with increasing amounts of data. The aim of this project is to build scalable SPARQL query processor for massive RDF databases on top of modern Big SQL systems (e.g., SPARK SQL, Cloudera Impala).

Related Resources

  • SPARQL Query language - https://www.w3.org/TR/rdf-sparql-query/
  • LUBM Benchmark - http://swat.cse.lehigh.edu/projects/lubm/
  • Spark SQL - https://spark.apache.org/sql/
  • Impala - https://impala.apache.org/
  • Apache Hive - https://hive.apache.org/