Menu

Posts tagged with ‘apache software foundation’

When you don’t realize you need a Comparable

kinow @ May 15, 2017 23:07:39

In 2012, I wrote about how you always learn something new by following the Apache dev mailing lists.

After about five years, I am still learning, and still getting impressed by the knowledge of other developers. Days ago I was massaging some code in a pull request and a developer suggested me to simplify my code.

The suggestion was to make a class a Comparable type to both simplify the code, and also have a better design. I immediately agreed, and looking back in hindsight, it was the most logical choice. Yet, I simply did not think about that.

// What the code was
case VSPACE_SORTKEY :
    int cmp = 0;
    String c1 = nv1.getCollation();
    String c2 = nv2.getCollation();
    if (c1 != null 
        Collator collator = Collator.getInstance(desiredLocale);
        cmp = collator.compare(nv1.getString(), nv2.getString());
    } else {
        cmp = XSDFuncOp.compareString(nv1, nv2) ;
    }
    return cmp;
}
// What the code is now
case VSPACE_SORTKEY :
    return ((NodeValueSortKey) nv1).compareTo((NodeValueSortKey) nv2);
}

This moved the logic to a method in the NodeValueSortKey class. This reduced the complexity of the class with the switch statement. And it also made it easier to write unit tests.

If you are not involved in Open Source projects yet, I keep my suggestion from five years ago. Find a project related to something you like, and start reading the code, lurk in the mailing list or watch GitHub repositories.

You can always learn more!

♥ Open Source

Apache Commons Lang: Memoizer

kinow @ Jan 08, 2017 18:34:03

The current release of Apache Commons Lang is 3.5. The upcoming release, probably 3.6, will include a new feature, added in a pull request: a Memoizer implementation. Check out the ticket LANG-740 for more about the implementation being added to [lang].

The book Java Concurrency in Practice introduces readers to the Memoizer, and has also a public domain implementation available for download (besides that, the book has also lots of other interesting topics!).

In summary, Memoizer is a simple cache, that will store the result of a computation. It receives a Computable object, responsible for doing something that will be stored by the Memoizer. Here’s a simple code to illustrate how that will work in your Java code.

// Computation to be stored in the cache
Computable<String, String> getFormattedCurrentDate = new Computable<String, String>() {
    @Override
    public String compute(String fmt) throws InterruptedException {
        return new SimpleDateFormat(fmt).format(new Date());
    }
};

// Our memoizer
Memoizer<String, String> dateCache = new Memoizer<>(getFormattedCurrentDate);

// To illustrate its use
for (int i = 0; i < 10; i++) {
    try {
        // S -> Millisecond
        System.out.println(dateCache.compute("HH:mm:ss:S Z dd/MM/YYYY"));
        // Regardless of this sleep call, we get the same result every iteration
        Thread.sleep(1500);
    } catch (InterruptedException e) {
        e.printStackTrace();
    }
}

The computable created (getFormattedCurrentDate) will be called only once, and stored in a map. The parameter passed in the #compute() method will be used as key in the map. So choose your parameter wisely :-) The output of the example will be similar to the following one.

19:15:57:854 +1300 08/01/2017
19:15:57:854 +1300 08/01/2017
19:15:57:854 +1300 08/01/2017
19:15:57:854 +1300 08/01/2017
19:15:57:854 +1300 08/01/2017
19:15:57:854 +1300 08/01/2017
19:15:57:854 +1300 08/01/2017
19:15:57:854 +1300 08/01/2017
19:15:57:854 +1300 08/01/2017
19:15:57:854 +1300 08/01/2017

In the example above I used a for-loop to illustrate what will happen. Even though we call the memoizer #compute() method several times, followed by Thread#sleep(); only one result, the first to be computed, will be returned.

So that’s all for today. Hope you learned something about this new class, that must be available in the next release of Apache Commons Lang.

Happy hacking!

ps: [lang] uses Java 7, so that is why we do not have a functional instead of the Comparable

Apache Commons Text

kinow @ Jan 07, 2017 20:39:03

There is a new component in Apache Commons: Apache Commons Text. The 1.0 release might be announced in the next weeks. The current site is still in the Commons Sandbox, but it will change with the 1.0 release. The promotion from the sandbox happened a few days ago in the project mailing list.

Here’s the project description: Apache Commons Text is a library focused on algorithms working on strings.

There was a thread on the mailing list some time ago (Oct/2014) when we first discussed the component idea. Since then many people contributed porting code from Apache Commons Lang, Apache Lucene, donating code from existing projects, and with new ideas.

It is important to be aware that certain parts of Apache Commons Lang are being marked as deprecated, and will be removed in the future, after Apache Commons Text 1.0 is out. For example: StringUtils, and RandomStringUtils.

That will happen probably in a 4.x release of Apache Commons Lang, if everything goes well with Apache Commons Text :-)

And there are already future features in branches too. It was decided that these features needed further work, so they will probably be included in next releases.

So that’s a little bit of background on the new component that will be released soon. If you have code using Apache Commons Lang, you might be interested in staying tuned to release announcements in the mailing list!

And should you have suggestions and would like to contribute, feel free to join and start a thread in the mailing list, open a JIRA issue, or submit a pull request.

Happy hacking!

Contributing to Apache Jena

kinow @ Jan 01, 2015 18:49:03

As I mentioned in my previous post, I am using Apache Jena for a project of a customer. I had never used any triple store, nor a SPARQL Endpoint server before. But for being involved with the Apache Software Foundation, and since the company itself is using several Apache components, it was only natural Jena to be our first choice.

It has served us very well so far. At the moment we have less than 100 queries per day, but the project is still under development and we expect 1000 queries per day by the first quarter of 2015 and 1000000 near the end of 2015. We also have few entries in TDB, but expect to grow this number to a few million before 2016.

When I work for companies and we use Open Source Software (OSS) in a project, I always prepare assessment reports to include in the deliveries. In this report I justify the choice of Open Source Software (as well as commercial software). Sometimes I am lucky to work for a company that asks me to include hours to work on OSS :-)

I use Trello to triage issues in OSS projects (and for several other things). I have a board with several cards for Open Source. About a month ago I set up one for Jena and listed the issues that I thought I could contribute to.

Jena Trello card
Jena Trello card

I annotate easy issues with a “lhf” suffix for Low Hanging Fruit issues, and delete issues from the card once I submit a patch or update it (and include it in another card for the TupiLabs reports).

Most of the issues I included in the card for Jena had been created over two years ago, and hadn’t been updated in a while. When you test these issues against the current code, usually you find that some of them have already been fixed. Other issues included documentation problems, and minor features. I didn’t find any blocker issue that would impede us to use Jena in production.

Jena JIRA activity summary
Jena JIRA activity summary

The picture above shows the past 30 days activity summary in JIRA for Jena. The red line shows issues created, and the green line issues resolved. Andy Seaborne was very active in the past days and fixed several issues that were too old and had already been fixed in the trunk, and kindly merged patches and pull requests.

Some issues like JENA-632 will take a longer time to fix, but I’m getting used to Jena’s source code, and at the same getting more confident to use it in production - especially with a supportive OSS community. We are using Jena for RDF with Hadoop, and I learned that I can replace some custom Writables by others in the Jena Hadoop submodule.

By the way, even though this project ends in April, I intend to continue contributing to Jena. There is a lot of parts of the code that I would love to be able to understand and contribute, in special the Graph database, optimization techniques for SPARQL queries, the grammars used in the project, Fuseki v2 and enhance its testing harness (as well as the test coverage).

If you are looking for a interesting project to get you started with semantics, linked data, RDF, and even graphs and database querying, try contributing to Jena. I bet you’ll have a lot of fun!

Happy hacking and happy 2015!

Strings transliteration in Java with Apache Commons Lang

kinow @ Aug 09, 2014 12:49:33

Rosalind is a website with a curated set of exercices for bioinformatics, organized hierarchily. In some of these examples you are required to replace characters (nucleotides) by other characters. It is a rather common task for developers, like when you need to replace special characters in user’s names.

There are different ways of describing it, such as translate, replace, or transliterate. The latter being my favorite definition.

In Python I know that there are several different ways of transliterating strings [1][2]. But in Java I always ended up using a Map or a Enum and writing my own method in some Util class for that.

Turns out that Apache Commons Lang, which I use in most of my projects, provided this feature. What means that I will be able to reduce the length of my code, what also means less code to be tested (and one less place to look for bugs).

String s = StringUtils.replaceChars("ATGCATGC", "GTCA", "CAGT"); // "TACGTACG"
System.out.println(s);

What the code above does, is replace G by C, T by A, C by G and A by T. This process is part of finding the DNA reverse complement. But you can also use this for replacing special characters, spaces by _, and so it goes.

Happy hacking!