Posts about technology and arts.
This is the first post in a series of three (or maybe four later) based on diagrams
I collected while debugging the Cylc scheduler. The scheduler is called by the
NB: this is a post to remember things, not really expecting to give someone enough information to be able to hack the Cylc Scheduler (though you can and would have fun!).
Instead of going at length on what happens (and there is quite a bit happening when
cylc start my.suite), I will use the following diagram, followed by a few paragraphs
to highlight certain parts. The code used was based on Cylc 7.7.1.
Since I started reading cylc’s source code in Eclipse to create some sequence diagrams, I have not been able to debug it properly without hitting errors in some part of the program execution.
The error message was “ImportError: cannot import name _remove_dead_weakref”, which was a bit enigmatic as I never heard about that function, but it seemed to be something internal, or at least not from the project code base. And searching the Internet did not help much.
Last post was about what happens when you upload a Turtle file to Apache Jena Fuseki. And now today’s post will be about what happens when you create a new dataset in Apache Jena Fuseki.
In theory, that happens before you upload a Turtle file, but this post series won’t follow a logical order. It will be more based on what I find interesting.
Oh, the dataset created is an in-memory dataset. Here’s a simplified sequence diagram. Again, these articles are more brain-dumps, used by myself for later reference.
The issue is now pending feedback, which gives me a moment to have fun with something else. So I decided to dig down the rabbit hole and start learning more about certain parts of the Apache Jena code base.
This post will be useful to myself in the future, as a note-taking in a series, so that I remember how things work - you never know right? But hopefully this will be useful to other wanting to understand more about the code of Apache Jena.
Knowing a bit of the code base, I went straight to the
from the Fuseki Core module. Set up a couple of breakpoints, uploaded my file,
but nothing. Then tried on its package-neighbour class,
Actually, it is easier to understand seeing the class hierarchy, and knowing that when I run the application in Eclipse, it is running with Jetty, serving servlets (there is no framework like Wicket, Struts, etc, involved).
Several filters are applied to the HTTP request too, like Cross Origin, Shiro, and
FusekiFilter. The latter looks at the requests to see if it includes a dataset.
If a dataset is found - it is in our case - then it hands the request over to
the right class to handle it.
REST_Quads_RW will take care of the upload action, using the
Upload class where my
Upload#incomingData() starts by checking the Content Type from the request. In my case
it is a
multipart/form-data. Then it calls its other method
#fileUploadWorker() creates a
ServletFilterUpload, from Apache Commons FileUpload.
With that, it opens a stream for the file, retrieves its name and other information,
such as the content type.
Ah, the content type is interesting too. It defaults to
RDFXML, but what’s interesting
is the comment.
if ( lang == null ) // Desperate. lang = RDFLanguages.RDFXML ;
Well, in this case we are getting a
Lang:Turtle. So it now knows that it has a Turtle
file, but it still needs to parse it.
ActionLib#parse(), which uses
RDFParserBuilder to build a parser.
It applies a nice fluent API design when doing that.
RDFParser.create() .errorHandler(errorHandler) .source(input) .lang(lang) .base(base) .parse(dest);
Side note to self: the `RDFParser` has a `canUse` flag. It seems to indicate the parser can be used just once. Though it looks actually it works until the stream is closed...
RDFParserBuilder will call
RDFParser, which in turn will use the
ARQ is a low level module in Jena, responsible for parsing queries, and also some of the interaction with graphs and datasets.
LangTurtleBase. Their task starts by populating
prefixMap, which contains all those prefixes used in queries like
Then it will keep parsing triples until it finds an
EOF. For every
triple, after the Predicate-Object-List is found, it calls
#emit()method creates a
Triple object (Jena Core, graph package).
And also a
StreamRDFCountingBase to keep track of statistics to display
back to the user.
StreamRDFWrapper, and wraps - as per name -
StreamRDF’s, such as
ParserOutputDataset holds a reference to the
DatasetGraph and also to the
prefixMap populated earlier in
LangTurtle. For each
Triple that we have
it will call the
DatasetGraph#add method, creating a new Quad with the
default graph name.
Finally, readers and streams are closed. An
UploadDetails object is created
holding stats collected in
StreamRDFCountingBase, which are also used for
Upload#incomingPath() will return the
UploadDetails. If there are no errors
then the transaction will be committed. It involves again classes from ARQ and
TDB (for journaling), but that will be for another post.
The final method called in the
Upload class will be
returns the object as JSON. This JSON string is then finally returned to the
So that’s it. Probably the next step will be to learn how
or maybe more about transactions in Jena.
Happy hacking !