Home

Groovy Hooks in Jenkins for increasing logging level

kinow @ Apr 12, 2015 11:30:03 ()

Yesterday, while debugging a problem we had in the BioUno update center, I realized that after increasing the logging level in the WEB interface, the messages that I needed weren’t being displayed in the logs.

It happened because some of the logging happened during Jenkins initialization, and before I could adjust the log level.

The solution was to use a Groovy Hook Script. If you are familiar with Linux init scripts, the idea is quite similar.

A Groovy script in the $JENKINS_ROOT_DIR/init.groovy.d/ directory is executed during Jenkins initialization. This way you can increase the global logger level with a script as the following below.

import java.util.logging.ConsoleHandler
import java.util.logging.LogManager
import java.util.logging.Logger
import java.util.logging.Level

def logger = Logger.getLogger("")
logger.setLevel(Level.FINEST)
logger.addHandler (new ConsoleHandler())

Happy logging!

Contributing to Apache Jena

kinow @ Jan 01, 2015 18:49:03 ()

As I mentioned in my previous post, I am using Apache Jena for a project of a customer. I had never used any triple store, nor a SPARQL Endpoint server before. But for being involved with the Apache Software Foundation, and since the company itself is using several Apache components, it was only natural Jena to be our first choice.

It has served us very well so far. At the moment we have less than 100 queries per day, but the project is still under development and we expect 1000 queries per day by the first quarter of 2015 and 1000000 near the end of 2015. We also have few entries in TDB, but expect to grow this number to a few million before 2016.

When I work for companies and we use Open Source Software (OSS) in a project, I always prepare assessment reports to include in the deliveries. In this report I justify the choice of Open Source Software (as well as commercial software). Sometimes I am lucky to work for a company that asks me to include hours to work on OSS :-)

I use Trello to triage issues in OSS projects (and for several other things). I have a board with several cards for Open Source. About a month ago I set up one for Jena and listed the issues that I thought I could contribute to.

Jena Trello card
Jena Trello card

I annotate easy issues with a “lhf” suffix for Low Hanging Fruit issues, and delete issues from the card once I submit a patch or update it (and include it in another card for the TupiLabs reports).

Most of the issues I included in the card for Jena had been created over two years ago, and hadn’t been updated in a while. When you test these issues against the current code, usually you find that some of them have already been fixed. Other issues included documentation problems, and minor features. I didn’t find any blocker issue that would impede us to use Jena in production.

Jena JIRA activity summary
Jena JIRA activity summary

The picture above shows the past 30 days activity summary in JIRA for Jena. The red line shows issues created, and the green line issues resolved. Andy Seaborne was very active in the past days and fixed several issues that were too old and had already been fixed in the trunk, and kindly merged patches and pull requests.

Some issues like JENA-632 will take a longer time to fix, but I’m getting used to Jena’s source code, and at the same getting more confident to use it in production - especially with a supportive OSS community. We are using Jena for RDF with Hadoop, and I learned that I can replace some custom Writables by others in the Jena Hadoop submodule.

By the way, even though this project ends in April, I intend to continue contributing to Jena. There is a lot of parts of the code that I would love to be able to understand and contribute, in special the Graph database, optimization techniques for SPARQL queries, the grammars used in the project, Fuseki v2 and enhance its testing harness (as well as the test coverage).

If you are looking for a interesting project to get you started with semantics, linked data, RDF, and even graphs and database querying, try contributing to Jena. I bet you’ll have a lot of fun!

Happy hacking and happy 2015!

Basic workflow of a SPARQL query in Fuseki

kinow @ Oct 11, 2014 20:36:33 ()

Before using any library or tool in a customer project, specially when it is an Open Source one, there are many things that I like to look at before deploying it. Basically, I look at the features, documentation, community, open issues (in special blockers or criticals), the time to release fixes and new features and, obviously, the license.

At the moment I’m using Apache Jena to work with ontologies, SPARQL and data matching and enrichment for a customer.

Jena is fantastic, and similar tools include Virtuoso, StarDog, GraphDB, 4Store and others. From looking at the code and its community and documentation, Jena seems like a great choice.

I’m still investigating if/how we gonna need to use inference and reasoners, looking at the issues, and learning my way through its code base. The following is my initial mapping of what happens when you submit a SPARQL query to Fuseki.

Fuseki SPARQL query work flow
Jena JDBC

My understanding is that Fuseki is just a web layer, handling a bunch of validations, logging, error handling, and relying on the ARQ module, that is who actually handles the requests. I also think a new Fuseki server is baking in the project git repo, so stay tuned for an updated version of this graph soon.

Happy hacking!

Cypher, Gremlin and SPARQL: Graph dialects

kinow @ Sep 09, 2014 10:14:33 ()

When I was younger and my older brother was living in Germany, I asked him if he had learned German. He said that he did, and explained that there are several dialects, and he was quite proud for some people told him that he was using the Bavarian dialect correctly.

Even though Cypher, Gremlin and SPARQL are all query languages, I think we can consider them dialects of a common graph language. Cypher is the query language used in neo4j, a graph database. Gremlin is part of the Tinkerpop, an open source project that contains graph server, graph algorithms, graph language, among other sub-projects. And last but not least, SPARQL is used to query RDF documents.

Let’s use the example of the Matrix movie provided by neo4j to take a look at the three languages.

Cypher

First we create the graph.

create (matrix1:Movie {id : '603', title : 'The Matrix', year : '1999-03-31'}),
 (matrix2:Movie {id : '604', title : 'The Matrix Reloaded', year : '2003-05-07'}),
 (matrix3:Movie {id : '605', title : 'The Matrix Revolutions', year : '2003-10-27'}),

 (neo:Actor {name:'Keanu Reeves'}),
 (morpheus:Actor {name:'Laurence Fishburne'}),
 (trinity:Actor {name:'Carrie-Anne Moss'}),

 (matrix1)<-[:ACTS_IN {role : 'Neo'}]-(neo),
 (matrix2)<-[:ACTS_IN {role : 'Neo'}]-(neo),
 (matrix3)<-[:ACTS_IN {role : 'Neo'}]-(neo),
 (matrix1)<-[:ACTS_IN {role : 'Morpheus'}]-(morpheus),
 (matrix2)<-[:ACTS_IN {role : 'Morpheus'}]-(morpheus),
 (matrix3)<-[:ACTS_IN {role : 'Morpheus'}]-(morpheus),
 (matrix1)<-[:ACTS_IN {role : 'Trinity'}]-(trinity),
 (matrix2)<-[:ACTS_IN {role : 'Trinity'}]-(trinity),
 (matrix3)<-[:ACTS_IN {role : 'Trinity'}]-(trinity)

Added 6 labels, created 6 nodes, set 21 properties, created 9 relationships, returned 0 rows in 2791 ms

And execute a simple query.

MATCH (a:Actor { name:"Keanu Reeves" })
RETURN a

(9:Actor {name:"Keanu Reeves"})

Gremlin

Again, let’s start by creating our graph.

g = new TinkerGraph();
matrix1 = g.addVertex(["_id":603,"title":"The Matrix", "year": "1999-03-31"]);
matrix2 = g.addVertex(["_id":604,"title":"The Matrix Reloaded", "year": "2003-05-07"]);
matrix3 = g.addVertex(["_id":605,"title":"The Matrix Revolutions", "year": "2003-10-27"]);

neo = g.addVertex(["name": "Keanu Reeves"]);
morpheus = g.addVertex(["name": "Laurence Fishburne"]);
trinity = g.addVertex(["name": "Carrie-Anne Moss"]);

neo.addEdge("actsIn", matrix1); 
neo.addEdge("actsIn", matrix2); 
neo.addEdge("actsIn", matrix3); 
morpheus.addEdge("actsIn", matrix1); 
morpheus.addEdge("actsIn", matrix2); 
morpheus.addEdge("actsIn", matrix3); 
trinity.addEdge("actsIn", matrix1); 
trinity.addEdge("actsIn", matrix2); 
trinity.addEdge("actsIn", matrix3);

And execute a simple query.

g.V.has('name', 'Keanu Reeves').map

gremlin> g.V.has('name', 'Keanu Reeves').map ==>{name=Keanu Reeves} gremlin>

Quite similar to neo4j.

SPARQL

Let’s load our example (thanks to Kendall G. Clark). I used Fuseki to run these queries.

@prefix :          <http://example.org/matrix/> .

 :m1 a :Movie; :title "The Matrix"; :year "1999-03-31".
 :m2 a :Movie; :title "The Matrix Reloaded"; :year "2003-05-07".
 :m3 a :Movie; :title "The Matrix Revolutions"; :year "2003-10-27".

 :neo a :Actor; :name "Keanu Reeves".
 :morpheus a :Actor; :name "Laurence Fishburne".
 :trinity a :Actor; :name "Carrie-Anne Moss".

 :neo :hasRole [:as "Neo"; :in :m1].
 :neo :hasRole [:as "Neo"; :in :m2].
 :neo :hasRole [:as "Neo"; :in :m2].
 :morpheus :hasRole [:as "Morpheus"; :in :m1].
 :morpheus :hasRole [:as "Morpheus"; :in :m2].
 :morpheus :hasRole [:as "Morpheus"; :in :m2].
 :trinity :hasRole [:as "Trinity"; :in :m1].
 :trinity :hasRole [:as "Trinity"; :in :m2].
 :trinity :hasRole [:as "Trinity"; :in :m2].

And finally the SPARQL query.

SELECT ?a WHERE {
   ?a a <http://example.org/matrix/Actor> .
   ?a <http://example.org/matrix/name> ?name .
   FILTER(?name  = "Keanu Reeves")
}

Returning the Keanu Reeves actor instance.

-----------------------------------
| a                               |
===================================
| <http://example.org/matrix/neo> |
-----------------------------------

SPARQL supports inference (or I must say that OWL, RDFS and the reasoners do), but it is easier to define the depth of a search in the graph using neo4j. As for Gremlin, it has native support to Groovy and Java. There is a common denominator for these three languages, but what makes them really powerful are their unique features.

I hope you enjoyed, and that this post gave you a quick overview of some of the existing graph languages. Make sure you ponder the pros and cons of each server/language, and make the best decision for your project. Take a look at other graph query languages too.

Happy hacking!


This post has been updated as suggested by @kendall (Thank you!). You can check the diff at GitHub

Strings transliteration in Java with Apache Commons Lang

kinow @ Aug 09, 2014 12:49:33 ()

Rosalind is a website with a curated set of exercices for bioinformatics, organized hierarchily. In some of these examples you are required to replace characters (nucleotides) by other characters. It is a rather common task for developers, like when you need to replace special characters in user’s names.

There are different ways of describing it, such as translate, replace, or transliterate. The latter being my favorite definition.

In Python I know that there are several different ways of transliterating strings [1][2]. But in Java I always ended up using a Map or a Enum and writing my own method in some Util class for that.

Turns out that Apache Commons Lang, which I use in most of my projects, provided this feature. What means that I will be able to reduce the length of my code, what also means less code to be tested (and one less place to look for bugs).

String s = StringUtils.replaceChars("ATGCATGC", "GTCA", "CAGT"); // "TACGTACG"
System.out.println(s);

What the code above does, is replace G by C, T by A, C by G and A by T. This process is part of finding the DNA reverse complement. But you can also use this for replacing special characters, spaces by _, and so it goes.

Happy hacking!