Blog Taxonomy

Home

What the weather forecast looks like in Sao Paulo, Brazil

kinow @ Oct 25, 2018 19:34:11 ()

I grew up watching the weather forecast in Sao Paulo, Brazil. Where it is called “previsão do tempo”, which is the literal translation in Portuguese.

This post contains some recent screen shots of the weather forecast as it is presented in one of the main TV channels, as well as some screenshots from an online video of other local institutions that either provide the forecast, or help running the NWP models.

( Read more ... )

A couple of class diagrams of JupyterHub

kinow @ Oct 06, 2018 21:43:44 ()

Started on a new project last Monday. One of the tasks in this project involves a new design for the Web layer. And as the application is quite similar to JupyterHub, we are all learning more about its internal API and general system design.

This post contains only two class diagrams created with PyCharm. One is actually a SQLAlchemy ORM diagram, below.

And the class diagram (which I removed object and a tried to make it simpler to interpret).

I enjoyed the parts of the code and the part of the system design that I could read about so far. But that’s all for now, until I have more time to learn more about the project and the code.

p.s. there are spawners and other implementation classes in other GitHub repositories… so a more complete diagram may come later on

Cylc Scheduler Internals - Part 3

kinow @ Aug 18, 2018 18:27:37 ()

This is part 2, in a series of posts about Cylc internals. The part 1 had the beginning of the workflow. part 2 documented from the moment the method configure() is called. This post will continue right after the continue() method returns, going on with the next method: run().

NB: this is a post to remember things, not really expecting to give someone enough information to be able to hack the Cylc Scheduler (though you can and would have fun!).

At this point, the Suite Server program must have been initialized, with the objects that it requires, and with everything configured. So this method is the one that starts the whole work on the tasks & proxies.

The runahead points, i.e. what are the next available cycle points, are calculated and scheduled. In the scheduling, tasks are queued for execution.

Most of the interesting action takes place when process_task_pool() is called, and in the submit_task_jobs(). The latter is a method from TaskJobManager, and it is here where - in this case - my shell command is executed through a temporary shell script file.

The graph was created from the initial execution of a suite that was starting from scratch (they can also be reinitialized). If there are multiple tasks waiting, or if a suite was restarted, the diagram would look considerably different.

You can download the source file for the diagram used in this post, and edit it with draw.io.

Use of Logging in Java Image Processing libraries

kinow @ Aug 12, 2018 17:55:44 ()

For IMAGING-154 I was trying to think in a solution for the existing Debug class. This class was the issue of discussion during a previous 1.0 release vote thread.

Initially I tried simply changing the class a bit, and make it configurable, so that we could keep it - as there is a valid use case for having a class that collects information during the image processing algorithms were applied.

And for me, the next step would naturally be to remove the use of System.out, as the Debug class would now have a PrintStream that could be System.out or something else.

That’s when I realized in the current version of Apache Commons Imaging (née Sanselan) uses System.out when debugging, but also when it wants to enable a verbose” mode in some classes.

I am using this post to collect a few cases, and check what other image processing images are doing with regards to logging.

Library Logger
Java ImageIO No logging (exceptions only)
im4java No logging (exceptions only)
opencv JNI No logging (exceptions only)
GeoSolutions ImageIO-Ext java.util.logging
Catalano java.util.logging
Apache PDFBox JBIG2 Custom Logger
ImageJ2 SciJava Logger
Fiji SLF4J + logback
OpenJPEG Custom Logger
imgscalar System.out
Apache Commons Imaging System.out
Sanselan System.out
Marvin System.out
Processing System.out

UUID's in Apache Jena

kinow @ Aug 11, 2018 19:02:16 ()

In this post I won’t talk about what are UUID’s, or how they work in Java. Here‘s a great article on that. Or access the always reliable Wikipedia article about it. (or if you would rather, read the RFC 4122)

I found out that Jena had UUID implementations after writing a previous post. And then decided to look into which UUID’s Jena has, and where these UUID’s were used. This way I would either understand why Jena needed UUID’s, or just be more educated in case I ever stumbled with a change in Jena that required related work.

Jena Core’s org.apache.jena.shared.uuid

This package is small and simply contains: factories,

Zatoichi Crying

  • UUIDFactory: interface for a factory of UUID’s
  • UUID_V1_Gen: a factory for UUID_V1
  • UUID_V4_Gen: a factory for UUID_V4

and UUID implementations,

  • JenaUUID: abstract base class for UUID implementations
  • UUID_nil: a special UUID, nil, filled with zeroes
  • UUID_V1: UUID V1
  • UUID_V4: UUID V4

and a utility class

  • LibUUID: with methods to create a Random and to create byte[] seeds.

JenaUUID contains a method to return a JenaUUID as a Java’s UUID. And is used in the command line utility juuid, for transaction ID’s, and when a new dataset is created. For the new dataset, Fuseki will create files in a temporary location. The name of the temporary location is created using an instance of JenaUUID.

UUID_V1 and UUID_V4

Jena’s UUID_V1 is an implementation of Version 1 (time based), variant 2 (DCE). Which means it uses MAC address and timestamp to generate the universal unique ID’s.

It uses NetworkInterface.getNetworkInterfaces() to retrieve the MAC address of the node running Jena. When using localhost, the MAC address is not available, so it resorts to using a random number.


And Jena’s UUID_V4 is an implementation of Version 4(random), variant 2 (DCE). Which means it uses random numbers to generate the universal unique ID’s.

The factory for V4 will have a random for the most significant bits, and for the least significant bits of the UUID (also including version and variant). The random for the factory is created by LibUUID#makeRandom(). This method returns a SecureRandom with two seeds, one being random, and the other created with LibUUID#makeSeed().

UUID_V4 uses a SecureRandom created locally but with the seed also set by LibUUID#makeSeed(). The seed returned by this method may use the MAC address, but will also use the os.version, user.name, java.version, number of active threads, total memory, free memory, and the hash code of a newly created Object.

Transaction ID UUID (TxnIdUuid) — uses JenaUUID

Jena contains two implementations of TxnId (transaction identifiers),

  • TxnIdSimple: transaction IDs are created with a counter within each JVM
  • TxnIdUuid: transaction IDs are created using JenaUUID.

The first thing that called my eye in this class was the inconsistency with the name - which is quite normal in large projects such as Jena.

As TxnIdUuid calls JenaUUID#generate(), it will use the default factory, UUID_V1_Gen. Then it will call asUUID to return a Java UUID object but with the same UUID.

Create a new dataset in Fuseki (ActionDatasets) — uses JenaUUID

When you create a new dataset in Fuseki, as explained in the previous post, Fuseki will create some temporary files and folders. For at least one folder, it will use an instance of JenaUUID, in ActionDatasets#execPostContainer().

Blank node IDs (BlankNodeId) — uses Java’s UUID

Blank nodes in Jena need an identifier too. It is possible to configure Jena to either return a JVM bound counter (similarly to how TxnIdSimple works), or otherwise blank nodes identifiers will be generated with java.util.UUID.randomUUID().

I wonder why the transaction ID’s use Jena’s JenaUUIDs, but the blank node IDs use Java’s UUID? They are compatible anyway.

Other methods related to blank nodes also use Java’s UUID,

  • BlankNodeAllocatorFixedSeedHash
  • BlankNodeAllocatorHash#freshSeed().

SPARQL functions, and NodeFunctions — uses Java’s UUID

SPARQL 1.1 contains functions UUID and STRUUID. Apache Jena provides these two functions, and users can use them in queries such as

SELECT (UUID() AS ?uuid) (StrUUID() AS ?strUuid) WHERE { }

(but before users would have to call extra functions in a different namespace).

The function implementations use NodeFunctions methods struuid and uuid. Both methods in NodeFunctions use Java’s UUID, and not JenaUUID.

Files and directories for databases / datasets — uses Java’s UUID

Files and directories created in Jena use Java’s UUID,

  • BufferAllocatorMapped#getNewTemporaryFile()
  • TDBBuilder#create methods and ComponentIdMgr constructor
  • AbstractDataBag#getNewTemporaryFile().

Conclusion

In Jena there are places where instances of JenaUUID are used to produce a UUID, and other places where Java’s UUID is used.

Java’s UUID provides a variant 2 version 4 (random DCE), which is equivalent to UUID_V4. But there is no equivalent of UUID_V1, the default used in Jena.

And even though UUID_V4 and UUID are compatible, I believe Jena’s version is using a seed with so many JVM and operating system related settings (os.version, free memory, etc) in order to have a unique seed per node running Jena, independent of whether there are multiple JVM’s in the same node.

But to be honest, I am still not sure which one I would have to use, nor if there are cases where I should pick one over the other…

EDIT: Apache Jena’s lead dev replied with a bit of history about the project too (: