Posts about technology and arts.

Reading notes about DRMAA v2

The DRMAA v2 specification draft is ready to be published, and is in public comment until 31st July this year. I used DRMAA v1 to integrate Jenkins and PBS some time ago, but it was not a very elegant solution.

And in the end integrating other grid computing implementations like SGE would not be very simple.

This post contains my reading notes for DRMAA v2, and a short analysis of how this new specification could be used in a new tentative to integrate Jenkins and several grid computing implementations in a single plug-in.

Reasons for having pt and pt-BR in a software

Some time ago I found some spare time to work on a different Open Source project: SKOSMOS. SKOSMOS is a web based SKOS browser and publishing tool, used to create vocabularies using the SKOS ontology.

I decided to help with translation, but there was no Brazilian Portuguese option, only Portuguese. I used a few arguments to suggest that having Brazilian Portuguese would be a good thing.

Another Open Source project that I use in a side project is LanguageTool. LanguageTool is used for proof-reading, and uses rules to find spelling and grammar errors.

Today I saw a message in the LanguageTool mailing list discussing whether having a Brazilian Portuguese page would make sense, or if it would be better to have just Portuguese, and then add rules for special cases.

Processing Vaisala Radiosonde data with Python, and creating GRUAN-like NetCDF files

One of my last projects involved parsing a few GB’s of data that was in a certain binary format, and convert it into NetCDF files. In this post I will describe what was done, what I learned about radiosonde, GRUAN and other geek stuff. Ready?

Vaisala Radiosonde data

When I was told the data was in some binary format, I thought it would be similar to parsing a flat file. That this would contain a fixed length entry, with the same kind of item repeated multiple times.

Well, not exactly.

The files had been generated by an instrument made by Vaisala, a Finnish company. This instrument is called a radiosonde. It is an instrument about the size of an old mobile phone, that is launched with a balloon into the atmosphere.

I was lucky to be given the chance to release one of these balloons carrying a newer version of this equipment.

Radiosonde balloon launch

The balloon can carry equipments for measuring different things, like air pressure, altitude, temperature, latitude, longitude, relative humidity, among others. Equipments like the radiosonde send the data back to a ground-level station via radio, normally in a short and constant interval.

Drawing sketch: Blue Hair

For redditgetsdrawn

Some Linux commands I used this week

These are some commands I used on Linux servers this week. Adding them here in case someone else find them interesting, and also due to my bad memory :-)

Listing latest installed packages in SLES

rpm -qa --last

This will display the last packages installed. Useful when there are packages being updated, and you need to confirm what changed, and when.

Listing packages in SLES and origin repository

rpm -qa --qf '%-30{DISTRIBUTION} %{NAME}\n'| sort

The output will have two columns. The first containing the repository name, and the second column with the package name. For example.

devel:languages:R:base / SLE_11_SP2 R-base
devel:languages:R:base / SLE_11_SP2 R-base-devel
home:flacco:sles / SLE_11_SP3 php53-phar
home:happenpappen / SLE_11_SP2 nodejs

Grep for content in XML tags

Be it for web services, or for finding things in Jenkins XML files. Being able to grep the tag attribute or tag name might be useful. Look at the following example that uses the books XML provided by Microsoft for testing.

grep -oP "(?<=<genre>).*?(?=</genre>)" books.xml | sort | uniq

Which will outputs the following.

Science Fiction

Find Python site packages directory

Sometimes you have Anaconda, but also the system installation, and maybe even other Python distributions. Knowing where Python is looking for site packages can be helpful to confirm the package exists, and also to inspect its sources.

python -c "from distutils.sysconfig import get_python_lib; print(get_python_lib())"

An example of the output of the script.


Force no-cache via curl for a list of files

Useful when you have a proxy like squid caching some requests from an application and you want to flush the cache and get the latest content (which will be cached again, but then you can fix it once confirmed).

curl --silent -H 'Cache-Control: no-cache' http://systemcachingvalues.local/somedoc.html

Find to which servers a Linux process is talking to

You have to find the pid of the process that you would like to investigate (e.g. 6364) and have strace installed.

strace -p 6364 -f -e trace=network -o output.txt

The command above creates output.txt with the trace information. Then you can grep for the IP addresses with the following regex.

grep -E -o "(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)" output.txt

Which will output something similar to the following example.

And finally, you can call dig to get the server name, and also remove duplicates.

grep -E -o "(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)" output.txt | xargs -l dig +noall +answer +nocmd -x | awk '{ print $5}' | sort | uniq

Which gives you the following.

That’s all for today.

Happy hacking!