Blog

Posts about technology and arts.

Cypher, Gremlin and SPARQL: Graph dialects

When I was younger and my older brother was living in Germany, I asked him if he had learned German. He said that he did, and explained that there are several dialects, and he was quite proud for some people told him that he was using the Bavarian dialect correctly.

Even though Cypher, Gremlin and SPARQL are all query languages, I think we can consider them dialects of a common graph language. Cypher is the query language used in neo4j, a graph database. Gremlin is part of the Tinkerpop, an open source project that contains graph server, graph algorithms, graph language, among other sub-projects. And last but not least, SPARQL is used to query RDF documents.

Let’s use the example of the Matrix movie provided by neo4j to take a look at the three languages.

Cypher

First we create the graph.

create (matrix1:Movie {id : '603', title : 'The Matrix', year : '1999-03-31'}),
 (matrix2:Movie {id : '604', title : 'The Matrix Reloaded', year : '2003-05-07'}),
 (matrix3:Movie {id : '605', title : 'The Matrix Revolutions', year : '2003-10-27'}),

 (neo:Actor {name:'Keanu Reeves'}),
 (morpheus:Actor {name:'Laurence Fishburne'}),
 (trinity:Actor {name:'Carrie-Anne Moss'}),

 (matrix1)<-[:ACTS_IN {role : 'Neo'}]-(neo),
 (matrix2)<-[:ACTS_IN {role : 'Neo'}]-(neo),
 (matrix3)<-[:ACTS_IN {role : 'Neo'}]-(neo),
 (matrix1)<-[:ACTS_IN {role : 'Morpheus'}]-(morpheus),
 (matrix2)<-[:ACTS_IN {role : 'Morpheus'}]-(morpheus),
 (matrix3)<-[:ACTS_IN {role : 'Morpheus'}]-(morpheus),
 (matrix1)<-[:ACTS_IN {role : 'Trinity'}]-(trinity),
 (matrix2)<-[:ACTS_IN {role : 'Trinity'}]-(trinity),
 (matrix3)<-[:ACTS_IN {role : 'Trinity'}]-(trinity)

Added 6 labels, created 6 nodes, set 21 properties, created 9 relationships, returned 0 rows in 2791 ms

And execute a simple query.

MATCH (a:Actor { name:"Keanu Reeves" })
RETURN a

(9:Actor {name:"Keanu Reeves"})

Gremlin

Again, let’s start by creating our graph.

g = new TinkerGraph();
matrix1 = g.addVertex(["_id":603,"title":"The Matrix", "year": "1999-03-31"]);
matrix2 = g.addVertex(["_id":604,"title":"The Matrix Reloaded", "year": "2003-05-07"]);
matrix3 = g.addVertex(["_id":605,"title":"The Matrix Revolutions", "year": "2003-10-27"]);

neo = g.addVertex(["name": "Keanu Reeves"]);
morpheus = g.addVertex(["name": "Laurence Fishburne"]);
trinity = g.addVertex(["name": "Carrie-Anne Moss"]);

neo.addEdge("actsIn", matrix1); 
neo.addEdge("actsIn", matrix2); 
neo.addEdge("actsIn", matrix3); 
morpheus.addEdge("actsIn", matrix1); 
morpheus.addEdge("actsIn", matrix2); 
morpheus.addEdge("actsIn", matrix3); 
trinity.addEdge("actsIn", matrix1); 
trinity.addEdge("actsIn", matrix2); 
trinity.addEdge("actsIn", matrix3); 

And execute a simple query.

g.V.has('name', 'Keanu Reeves').map

gremlin> g.V.has('name', 'Keanu Reeves').map ==>{name=Keanu Reeves} gremlin>

Quite similar to neo4j.

SPARQL

Let’s load our example (thanks to Kendall G. Clark). I used Fuseki to run these queries.

@prefix :          <http://example.org/matrix/> .

 :m1 a :Movie; :title "The Matrix"; :year "1999-03-31".
 :m2 a :Movie; :title "The Matrix Reloaded"; :year "2003-05-07".
 :m3 a :Movie; :title "The Matrix Revolutions"; :year "2003-10-27".
 
 :neo a :Actor; :name "Keanu Reeves".
 :morpheus a :Actor; :name "Laurence Fishburne".
 :trinity a :Actor; :name "Carrie-Anne Moss".
 
 :neo :hasRole [:as "Neo"; :in :m1].
 :neo :hasRole [:as "Neo"; :in :m2].
 :neo :hasRole [:as "Neo"; :in :m2].
 :morpheus :hasRole [:as "Morpheus"; :in :m1].
 :morpheus :hasRole [:as "Morpheus"; :in :m2].
 :morpheus :hasRole [:as "Morpheus"; :in :m2].
 :trinity :hasRole [:as "Trinity"; :in :m1].
 :trinity :hasRole [:as "Trinity"; :in :m2].
 :trinity :hasRole [:as "Trinity"; :in :m2].

And finally the SPARQL query.

SELECT ?a WHERE {
   ?a a <http://example.org/matrix/Actor> .
   ?a <http://example.org/matrix/name> ?name .
   FILTER(?name  = "Keanu Reeves")
}

Returning the Keanu Reeves actor instance.

-----------------------------------
| a                               |
===================================
| <http://example.org/matrix/neo> |
-----------------------------------

SPARQL supports inference (or I must say that OWL, RDFS and the reasoners do), but it is easier to define the depth of a search in the graph using neo4j. As for Gremlin, it has native support to Groovy and Java. There is a common denominator for these three languages, but what makes them really powerful are their unique features.

I hope you enjoyed, and that this post gave you a quick overview of some of the existing graph languages. Make sure you ponder the pros and cons of each server/language, and make the best decision for your project. Take a look at other graph query languages too.

Happy hacking!


This post has been updated as suggested by @kendall (Thank you!). You can check the diff at GitHub

Strings transliteration in Java with Apache Commons Lang

Rosalind is a website with a curated set of exercices for bioinformatics, organized hierarchily. In some of these examples you are required to replace characters (nucleotides) by other characters. It is a rather common task for developers, like when you need to replace special characters in user’s names.

There are different ways of describing it, such as translate, replace, or transliterate. The latter being my favorite definition.

In Python I know that there are several different ways of transliterating strings [1][2]. But in Java I always ended up using a Map or a Enum and writing my own method in some Util class for that.

Turns out that Apache Commons Lang, which I use in most of my projects, provided this feature. What means that I will be able to reduce the length of my code, what also means less code to be tested (and one less place to look for bugs).

String s = StringUtils.replaceChars("ATGCATGC", "GTCA", "CAGT"); // "TACGTACG"
System.out.println(s);

What the code above does, is replace G by C, T by A, C by G and A by T. This process is part of finding the DNA reverse complement. But you can also use this for replacing special characters, spaces by _, and so it goes.

Happy hacking!

Treemapping Jenkins Extension Points with R

I have been playing with R and its packages for some time, and decided to study it a bit harder. Last week I started reading the Advanced R Programming by Hadley Wickham.

One of the first chapters talks about the basic data structures in R. In order to get my feet wet I thought about a simple example: treemapping Jenkins extension points.

Writing a custom SchemaSpy command for Laravel 4

This week I had to write my first custom command for Laravel 4. In Nestor-QA, Peter and I thought it would be useful to have the database schema being automatically generated with SchemaSpy in our Jenkins box.

Thanks to Artisan this task is much simpler than I thought. The following command creates the schemaspy command.

php artisan command:make SchemaSpyCommand --command=schemaspy

This will create the file app/commands/SchemaSpyCommand.php. And all I had to do was just fill in the options and write the exec command as the Laravel 4 docs explain.

$this->info('Creating SchemaSpy');

$jar = $this->option("jar");
$dbtype = $this->option("dbtype");
$output = $this->option("output");

$commandLine = sprintf("java -jar %s -u none -t %s -o %s", $jar, $dbtype, $output);

$this->info(sprintf("Command line: [%s]", $commandLine));

exec($commandLine);

That’s how my final command looks. Now the final step is integrate it into the application by adding the line below to app/start/artisan.php.

Artisan::add(new SchemaSpyCommand);

And that’s it, running php artisan schemaspy --jar=/opt/schemaspy/schemaSpy_5.0.0.jar --dbtype=app/database/sqlite.properties --output=database-schema creates the database schema docs in the database-schema directory.

Check this gist for the final code.

Happy coding!

Missing menus in new installation of TestLink 1.9.8

I recently installed TestLink 1.9.8 and noticed that the menus and some other parts of the UI we missing. Looking at /var/log/testlink/userlog1.log (the location may change depending on your settings) I realized that there was something wrong with my PHP installation. There were log messages like the below.

include_once(ADORecordSet_ext_empty.class.php): failed to open stream: No such file or directory - in /home/kinow/php/workspace/testlink-1.9.8/lib/functions/common.php - Line 92
[13/Sep/18 12:51:09][WARNING][2o0h173pdgg5fjqh1pukr83og2][GUI]
E_WARNING
include_once(): Failed opening 'ADORecordSet_ext_empty.class.php' for inclusion (include_path='.:/usr/share/php:/usr/share/pear:.:/home/kinow/php/workspace/testlink-1.9.8/lib/functions/:/home/kinow/php/workspace/testlink-1.9.8/lib/issuet
...

I found a post in a forum (but unfortunately I forgot to save the link) that suggested these ADORecordSet were being caused by the module php5-adodb. Removing the module, and cleaning the templates cache directory ($TESTLINK_HOME/gui/templates_c/*) fixed the issue for me.

Hope that helps. Happy testing!

Subscribe