Declarative Querying of Distributed Graphs

Abstract

Graph is a fundamental data structure that captures relationships between different data entities. In practice, graphs are widely used for modeling complicated data in different application domains such as social networks, protein networks, transportation networks, bibliographical networks, knowledge bases and many more. Currently, graphs with millions and billions of nodes and edges have become very common. In principle, graph analytics is an important big data discovery technique. Therefore, with the increasing abundance of large graphs, designing scalable systems for processing and analyzing large scale graphs (e.g., Giraph, GraphX, GraphLab) has become one of the most timely problems facing the big data research community. In general, scalable processing of big graphs is a challenging task due to their size and the inherent irregular structure of graph computations. Currently, distributed graph processing system relies on low level APIs and there is a lack of declarative languages to express the large scale data processing tasks. G-CORE has been recently proposed and designed by the LDBC Graph Query Language Task Force, consisting of members from industry and academia, intending to bring the best of both worlds to graph practitioners. The aim of this project is to implement an efficient execution engine for the G-Core language on top of distributed graph processing platforms

Related Resources:

G-Core Query Languages https://arxiv.org/pdf/1712.01550.pdf
https://spark.apache.org/graphx/
http://giraph.apache.org/
Large scale graph processing systems: survey and an experimental evaluation. Cluster Computing 18(3): 1189-1213 (2015). https://doi.org/10.1007/s10586-015-0472-6