Posts tagged with ‘apache software foundation’

Using Apache Commons Functor functional interfaces with Java 8 lambdas

kinow @ Dec 21, 2012 15:08:14

Apache Commons Functor (hereon [functor]) is an Apache Commons component that provides a functional programming API and several patterns implemented (visitor, generator, aggregator, etc). Java 8 has several nice new features, including lambda expressions and functional interfaces. In Java 8, lambdas or lambdas expressions are closures that can be evaluated and behave like anonymous methods.

Functional interfaces are interfaces with only one method. These interfaces can be used in lambdas and save you a lot of time from writing anonymous classes or even implementing the interfaces. [functor] provides several functional interfaces (thanks to Matt Benson). It hasn’t been released yet, but there are some new examples in the project site, in the trunk of the SVN. I will use one of these examples to show how [functor] functional interfaces can be used in conjunction with Java 8 lambdas.

After the example with [functor] in Java 8, I will explain how I am running Java 8 in Eclipse (it’s kind of a gambiarra, but works well).

( Read more … )

Replacing a HashSet with a BitSet

kinow @ Oct 20, 2012 19:51:39

I always read the messages in the Apache dev mailing lists, including Apache Commons dev mailing list. And you should too. There are always interesting discussions. Sometimes you participate, other times you only watch what’s happening, but in the end you always learn something new.

A few days ago, I found an issue where it was being proposed to replace an unnecessary HashSet in ArrayUtils#removeElements() by a BitSet. Here’s how the code looked like:

HashSet<Integer> toRemove = new HashSet<Integer>();
for (Map.Entry<Character, MutableInt> e : occurrences.entrySet()) {
    Character v = e.getKey();
    int found = 0;
    for (int i = 0, ct = e.getValue().intValue(); i < ct; i++) {
        found = indexOf(array, v.charValue(), found);
        if (found < 0) {
return (char[]) removeAll((Object)array, extractIndices(toRemove));

The HashSet created at line 1, in the code above, was used to store the array index of the elements that should be removed. And at line 13 there is a call to removeAll method, passing the indexes to be removed. And here’s how the new code looks like:

BitSet toRemove = new BitSet();
for (Map.Entry<Character, MutableInt> e : occurrences.entrySet()) {
    Character v = e.getKey();
    int found = 0;
    for (int i = 0, ct = e.getValue().intValue(); i < ct; i++) {
        found = indexOf(array, v.charValue(), found);
        if (found < 0) {
return (char[]) removeAll(array, toRemove);

The first difference is at line 1. Instead of a HashSet, it is now using a BitSet. And at line 10, instead of adding a new element to the HashSet, now it “sets” a bit in the set (the bit at the specified position is now true). But there are important changes at line 13. The method removeAll was changed, and now the array doesn’t require a cast anymore. And the it is not necessary to cast the elements from HashSet anymore, as now the bit in the index position of the set is set to true. So the extractIndices method could be removed.

The code got simpler. But that’s not all. At Apache Software Foundation you can find a lot of talented developers - that’s why I got so excited after joining them. Besides simplifying the code, the developer responsible for these changes (sebb) also pointed out that the new code consumes less memory and is faster. Ah! And he also wrote unit tests

Integrating Nutch 2.x, MySQL and Solr

kinow @ Sep 14, 2012 00:02:31

Right now we are working on a new project using Apache Nutch 2.x, Apache Hadoop, Apache Solr 4 and a lot of other cool tools/modules/API’s/etc. After following the instructions found on http://nlp.solutions.asia/?p=180, I’ve successfully connected Apache Nutch, MySQL and Apache Solr.


In summary:

  • Create a database to hold your data
  • Use SQLDataStore and add configuration for your MySQL server
  • Update Apache Nutch configuration
  • Update Solr schema

Now our Apache Nutch uses MySQL as data store (the place where it keeps the result of the crawling process, such as URL, text content, metadata, and so on). That’s grand, but there is one part missing in the Solr Schema provided in the blog post.

Due to SOLR-3432, after following the tutorial and replacing the schema, we couldn’t delete the whole index anymore. After following the instructions in the bug comments, and adding the following entry in schema.xml it worked again.

( Read more … )

How I started in Open Source

kinow @ Aug 08, 2012 10:18:13

I was studying at Mackenzie Presbyterian University when I met Professor Rogerio Brito. Most of my friends were terrified of him. Maybe because of the class name “Structured Programming I“. Or maybe because of what Brito taught us at class.

My degree was in computing, Bachelor in System Information. Although it is related to Computer Science, we don’t have the same classes. In Computer Science you have more math, statistics and even physics, while in System Information you have administration, strategic planning, law and others related to business.

Although Structured Programming had a list of topics, Brito didn’t limit himself to those. Eventually he would try to teach us about algorithms, software complexity, the importance of writing academic papers and explain about Open Source software.

I remember one day we were dismissed before 10PM. I used to run for the bus stop for not being too late at home and cause it could be dangerous lingering on the streets of Consolacao neighborhood by myself, waiting for the bus. But that day I stumbled with Brito near the Professor’s Room.

He invited me in. It was my first time in there. There were books and papers lying on the main table and in another part of the room there was a desktop computer. We started talking about programming and I can’t recall how but we ended up talking about Open Source. He showed me some websites and explained about a lot of nerd stuff.

I left the room after 11PM. Late for my bus and with a paper sheet. A very important paper sheet. It had a list of items to study, software to learn about and names that I had never heard before. One of the items of that list that I can still remember was “Reading about Open JDK” (now being adopted over Oracle/Sun JDK). There was “Create a program for finding software bugs” too (like the famous FindBugs).

He taught me about the basics of Open Source, gave me a list of interesting projects, explained how to join the project, where to look for information and how to assess the maturity and quality of a project. And also taught me more about USP - University of Sao Paulo - and how to submit academic papers and what was a call for paper (when he said that word I translated it to Portuguese, you can imagine the confusion in my head) :-)

He made me curious.

This weekend I sent him an e-mail about analyzing the parallel execution of a software, using Debian. He sent me a reply with some interesting links and a brief introduction about the topic. It’s starred at my inbox, while I look for some spare time to read his links and, certainly, learn something useful and interesting for any nerd.

Since that class, back in 2005, 2006, I’ve been involved in several Open Source projects, became an Apache Committer few days ago (I will write about this later) and I have created a company (TupiLabs) specialized in Open Source.

Thank you for lecturing me Brito, and thank you for not limiting your classes to the class planning. I believe that’s the best quality of a professor.

I wish all the success to you in your projects, and health to you and your family.

Ranges in Apache Commons Functor

kinow @ Jan 22, 2012 19:54:36

This is a long post. So here is a TL;DR:

  • Apache Commons Functor has no Double or Float Range (yet)
  • Apache Commons Functor IntegerRange and LongRange treat the low value as inclusive, and the high value as exclusive. How does that compare to other languages/APIs? (you will have to read to see some comparison)
  • Perl has support for characters ranges, perhaps we could implement it in Functor too?
  • In case we implemented a CharacterRange, it would have to be inclusive for both low and high limits. With ‘z’ being the last character, there wouldn’t have a way to include Z with the current approach. Or we would have to make the CharacterRange a special one. What would go against Liskov Substitution Principle.
  • You can see a comparison table with Apache Commons Functor, other Java API’s and other programming languages for ranges clicking here.
  • It would be nice to have a clear distinction in Functor documentation among a Sequence, a Generator and a Range. While I was gathering material for this post, I found places using range, others using sequence, and in Apache Commons Functor, an IntegerRange is a Generator.
Now, if you have some spare time or curiosity, keep reading :-)

( Read more … )