Blog Taxonomy

Home

Running word-count example on a Hadoop commodity-hardware cluster and on a Hadoop local installation

kinow @ Sep 20, 2012 21:49:38 ()

Last weekend I spent some hours assembling old computer parts to create my commodity hardware cluster for running Hadoop. I already had a local installation in my notebook, so I thought it would be cool to run the word-count example in both scenarios to see what would be the results.

But first, let’s review the hardware configurations:

( Read more ... )

Integrating Nutch 2.x, MySQL and Solr

kinow @ Sep 14, 2012 00:02:31 ()

Right now we are working on a new project using Apache Nutch 2.x, Apache Hadoop, Apache Solr 4 and a lot of other cool tools/modules/API’s/etc. After following the instructions found on http://nlp.solutions.asia/?p=180, I’ve successfully connected Apache Nutch, MySQL and Apache Solr.

mysql_hadoop_solr_nutch

In summary:

  • Create a database to hold your data
  • Use SQLDataStore and add configuration for your MySQL server
  • Update Apache Nutch configuration
  • Update Solr schema

Now our Apache Nutch uses MySQL as data store (the place where it keeps the result of the crawling process, such as URL, text content, metadata, and so on). That’s grand, but there is one part missing in the Solr Schema provided in the blog post.

Due to SOLR-3432, after following the tutorial and replacing the schema, we couldn’t delete the whole index anymore. After following the instructions in the bug comments, and adding the following entry in schema.xml it worked again.

( Read more ... )

Invoking Testopia XML-RPC or JSON methods using Java

kinow @ Sep 09, 2012 16:08:09 ()

Most TestLink [1] users are aware that there is an external API, maybe for the external API token being displayed in the user profile section. Today after a meeting with Peter Florijn [2], I realized that the same may not be true for Testopia [3] users.

I am quite new to Testopia, and there are many features that I haven’t used yet. But if I understand it correctly, the database is interfaced by several Perl scripts that are, by its turns, exposed as Web Service (most of them). The web services are available via a JSON and a XML-RPC API (what is very useful, TestLink supports supports only XML-RPC).

The communication between different programming languages and the external API’s is accomplished by a client API. In TestLink you have testlink-java-api [4] and testlink-api-java-client [5].

Testopia has a Java client too, available in Testopia source repository [6] and can be used to integrate your existing Java code with Testopia.

( Read more ... )

How I started in Open Source

kinow @ Aug 08, 2012 10:18:13 ()

I was studying at Mackenzie Presbyterian University when I met Professor Rogerio Brito. Most of my friends were terrified of him. Maybe because of the class name “Structured Programming I“. Or maybe because of what Brito taught us at class.

My degree was in computing, Bachelor in System Information. Although it is related to Computer Science, we don’t have the same classes. In Computer Science you have more math, statistics and even physics, while in System Information you have administration, strategic planning, law and others related to business.

Although Structured Programming had a list of topics, Brito didn’t limit himself to those. Eventually he would try to teach us about algorithms, software complexity, the importance of writing academic papers and explain about Open Source software.

I remember one day we were dismissed before 10PM. I used to run for the bus stop for not being too late at home and cause it could be dangerous lingering on the streets of Consolacao neighborhood by myself, waiting for the bus. But that day I stumbled with Brito near the Professor’s Room.

He invited me in. It was my first time in there. There were books and papers lying on the main table and in another part of the room there was a desktop computer. We started talking about programming and I can’t recall how but we ended up talking about Open Source. He showed me some websites and explained about a lot of nerd stuff.

I left the room after 11PM. Late for my bus and with a paper sheet. A very important paper sheet. It had a list of items to study, software to learn about and names that I had never heard before. One of the items of that list that I can still remember was “Reading about Open JDK” (now being adopted over Oracle/Sun JDK). There was “Create a program for finding software bugs” too (like the famous FindBugs).

He taught me about the basics of Open Source, gave me a list of interesting projects, explained how to join the project, where to look for information and how to assess the maturity and quality of a project. And also taught me more about USP - University of Sao Paulo - and how to submit academic papers and what was a call for paper (when he said that word I translated it to Portuguese, you can imagine the confusion in my head) :-)

He made me curious.

This weekend I sent him an e-mail about analyzing the parallel execution of a software, using Debian. He sent me a reply with some interesting links and a brief introduction about the topic. It’s starred at my inbox, while I look for some spare time to read his links and, certainly, learn something useful and interesting for any nerd.

Since that class, back in 2005, 2006, I’ve been involved in several Open Source projects, became an Apache Committer few days ago (I will write about this later) and I have created a company (TupiLabs) specialized in Open Source.

Thank you for lecturing me Brito, and thank you for not limiting your classes to the class planning. I believe that’s the best quality of a professor.

I wish all the success to you in your projects, and health to you and your family.

Writing code to integrate Java projects to Testopia

kinow @ May 18, 2012 17:16:25 ()

Peter Florijn and I are writing a Jenkins plug-in to integrate several test tools into Jenkins, something similar to what is done in TestLink plug-in. It’s still an idea being explored, and the whole project is subjected to changes without warning. The code is at https://github.com/kinow/testthemall.

The first tool that we are integrating is not TestLink, but Mozilla Testopia. As part of the process to integrate these tools, many Java API’s to interface the existing external APIs will be created, like it was done in TestLink with TestLink Java API.

Installing Testopia is very easy and straightforward. This was the best guide that I could find, and worked without errors at my Debian Squeeze. I only had to move the directories from /var/www to my home directory (I use my PHP Eclipse workspace as Apache home).

Testopia has a XML-RPC APi, just like TestLink, however it lacks an user friendly documentation and examples. I migrated the Java driver from Ant to Maven, for the sake of commodity. But the XML-RPC server is complaining that I have to log-in before listing the test cases of a test plan.

If you are interested in using Java and Testopia, here’s the link for the java project with Maven support: https://github.com/kinow/testopia-java-driver. I will update the project with examples, more tests and will try to clean up the code. Probably I will use either GitHub pages or a Wiki somewhere to document how to use Testopia and Java.

Stay tuned!

Cheers,