Tag

Posts tagged with ‘apache software foundation’

Learning more about SPARQL and Jena internals

kinow @ Apr 28, 2018 18:20:28

O Corvo
O Corvo

Recently a pull request for Apache Jena that I started three years ago got merged. Even though it has been three years since that pull request, there are still many parts of the project code base that I am not familiar with.

And not only the code, but there are also many concepts about SPARQL, other standards used in Jena, and internals about triple stores.

The following list contains some presentations and posts that I am reading right now, while I try to improve my knowledge of SPARQL and Jena internals.

Exif Odd Offsets

kinow @ Dec 25, 2017 21:43:33

A file format like JPEG may contain metadata in JFIF, Exif, or a vendor proprietary format. The Exif format is based - or uses parts of - on the TIFF format.

Within an Exif metadata block, you should see directories, with several entries. The entries have fields like description, value, and also an offset. The offset indicates the offset to the next entry.

The Exif specification defines that implementers must make sure to keep the offset an even number, within 4 bytes.

I recently worked on IMAGING-205, a ticket about odd offsets in files with Exif metadata. This issue was exactly to address that when files were rewritten with Apache Commons Imaging, even though the image initially had no odd offsets, after the entries were rearranged, we could have odd offsets.

The fix was simply checking for odd offsets, adding +1, and later it would be put within the 4 bytes limit.

A screen shot of Eclipse with source code
Locating the bug

One interesting point, however, is that this is in the standard, but not all software that read and write Exif follow the specification. So it is quite common to find images with odd offsets.

Which means you could take a picture with your phone, that contains some Exif metadata, and be surprised to analyze it with exiftool and get warnings about odd offsets. Most viewers handle odd and even offsets, so it should work for most cases, unless you have a strict reader/viewer.

Happy hacking!

&heart; Open Source

Watch out for Locales when using NumberFormat with currencies

kinow @ Dec 02, 2017 22:51:00

In Java you have the NumberFormatException to help you formatting and parsing numbers for any locale. Said that, here’s some code.

BigDecimal negative = new BigDecimal("-1234.56");

DecimalFormat nf = (DecimalFormat) NumberFormat.getCurrencyInstance(Locale.UK);
String formattedNegative = nf.format(negative);

System.out.println(formattedNegative);

The output for this code is -£1,234.56. That’s expected, as the locale is set to UK, so the currency symbol used is for British Pounds. And as the number is negative, you get that minus sign as a prefix. For Japanese locale you’d get -¥1,235, and for Brazilian locale you’d get -R$ 1.234,56.

So far so good.

What about the following code, with nothing different except for the locale set to US.

BigDecimal negative = new BigDecimal("-1234.56");

DecimalFormat nf = (DecimalFormat) NumberFormat.getCurrencyInstance(Locale.US); // <--- US now
String formattedNegative = nf.format(negative);

System.out.println(formattedNegative);

Some could intuitively expect -$1,234.56. However, the output is actually ($1,234.56).

There are different prefixes and suffixes. But in some locales the prefix can be empty, or, as in the case of the US locale, it can be quite different than what you could expect.

Learned about this peculiarity from NumberFormat while working on VALIDATOR-433 for Apache Commons Validator.

Happy hacking!

Quickly Verifying jar Signatures For ASF Releases

kinow @ Oct 14, 2017 00:24:54

The release process within the Apache Software Foundation includes a series of steps. Amongst these steps is the voting process. In Apache Commons, the release instructions includes a note on artefact signatures.

During the course of the VOTE, make sure that one or more of the reviewers have verified the signatures and hash files included with the release artifacts. If no one specifically mentions having done that during the VOTE, ask on the dev list and make sure someone does this before you proceed with the release.

Tired of always having to manually check several artefacts, or having to come up with the correct shell commands to iterate through a list of files, the other day I wrote a simple script to download the KEYS file, import it, download all the artefacts, then iterate through them and verify the signature.

Here’s the script. Licensed under the GPL licence.

#!/usr/bin/env bash

url=""

# From: https://blog.mafr.de/2007/08/05/cmdline-options-in-shell-scripts/
USAGE="Usage: `basename $0` [-hv] https://repository.apache.org/.../commons/commons-configuration/2.2/"

# Parse command line options.
while getopts hv: OPT; do
    case "$OPT" in
        h)
            echo $USAGE
            exit 0
            ;;
        v)
            echo "`basename $0` version 0.0.1"
            exit 0
            ;;
        \?)
            # getopts issues an error message
            echo $USAGE >;
    esac
done

# Remove the switches we parsed above.
shift `expr $OPTIND - 1`

# We want at least one non-option argument. 
# Remove this block if you don't need it.
if [ $# -eq 0 ]; then
    echo $USAGE >&2
    exit 1
fi

# Access additional arguments as usual through 
# variables [email protected], $*, $1, $2, etc. or using this loop:
URL=$1

echo "url: ${URL}"

# Use a local temporary directory
BUILD_DIR=$(mktemp -d)
pushd "$BUILD_DIR"

echo "build dir: ${BUILD_DIR}"

# Download KEYS file
KEYS_URL=https://www.apache.org/dist/commons/KEYS

echo "importing KEYS from: ${KEYS_URL}"

wget "$KEYS_URL"
gpg --import KEYS

# Download JARs and signature files
echo "downloading all jars and signature files..."

wget -r -nd -np -e robots=off --wait 1 -R "index.html*" "${URL}"

# Check the files
for x in *.jar; do gpg --verify "${x}".asc; done

# EOF

The script can be found at GitHub too: https://github.com/kinow/dork-scripts/tree/3c519a74f28c08310ce2e65f8e860d61fd0c5c07/gpg/asf-sigs

Removing Javadoc SVN Id Tags with Shell Script

kinow @ Sep 13, 2017 16:49:26

Subversion supports Keyword Substitution, which performs substitution of some keywords such as Author, Date, and Id. The Id is the date, time, and user that last modified the file.

It used to be common to all Apache Commons components to have a line as follows in the header of each Java class.

/**
 * SomeClass class.
 *
 * @version $Id$
 */
public class SomeClass {

}

Then the generated Javadoc would contain the date of when the class was altered. Although useful, with proper versioning, it becomes obsolete. It is much more important to know what is the version of the software, not the last time it was modified or by whom. In case you have a problem with that specific file, you can always check the history of the file using git log, or git bisect, or …

Apache Commons components that are migrated to git need to have these lines removed. git does not support these Subversion Keywords so it is never properly rendered. And as every time I have to remove these lines I come up with some shell script snippet, I decided to document the last one I wrote, so that it can save me some time ‐ and perhaps for somebody else too?

find . -name "*.java" -exec sed -i '/^.*\*\s*@version\s*\$Id\$.*$/d' {} \;

And then push a commit with the change :-) In case you know some regex, you can change it and use the same command syntax to remove comments, specific configuration lines, etc.

That’s all. Happy scripting!

♥ Open Source