Tag

Posts tagged with ‘apache software foundation’

Quickly Verifying jar Signatures For ASF Releases

kinow @ Oct 14, 2017 00:24:54

The release process within the Apache Software Foundation includes a series of steps. Amongst these steps is the voting process. In Apache Commons, the release instructions includes a note on artefact signatures.

During the course of the VOTE, make sure that one or more of the reviewers have verified the signatures and hash files included with the release artifacts. If no one specifically mentions having done that during the VOTE, ask on the dev list and make sure someone does this before you proceed with the release.

Tired of always having to manually check several artefacts, or having to come up with the correct shell commands to iterate through a list of files, the other day I wrote a simple script to download the KEYS file, import it, download all the artefacts, then iterate through them and verify the signature.

Here’s the script. Licensed under the GPL licence.

#!/usr/bin/env bash

url=""

# From: https://blog.mafr.de/2007/08/05/cmdline-options-in-shell-scripts/
USAGE="Usage: `basename $0` [-hv] https://repository.apache.org/.../commons/commons-configuration/2.2/"

# Parse command line options.
while getopts hv: OPT; do
    case "$OPT" in
        h)
            echo $USAGE
            exit 0
            ;;
        v)
            echo "`basename $0` version 0.0.1"
            exit 0
            ;;
        \?)
            # getopts issues an error message
            echo $USAGE >;
    esac
done

# Remove the switches we parsed above.
shift `expr $OPTIND - 1`

# We want at least one non-option argument. 
# Remove this block if you don't need it.
if [ $# -eq 0 ]; then
    echo $USAGE >&2
    exit 1
fi

# Access additional arguments as usual through 
# variables $@, $*, $1, $2, etc. or using this loop:
URL=$1

echo "url: ${URL}"

# Use a local temporary directory
BUILD_DIR=$(mktemp -d)
pushd "$BUILD_DIR"

echo "build dir: ${BUILD_DIR}"

# Download KEYS file
KEYS_URL=https://www.apache.org/dist/commons/KEYS

echo "importing KEYS from: ${KEYS_URL}"

wget "$KEYS_URL"
gpg --import KEYS

# Download JARs and signature files
echo "downloading all jars and signature files..."

wget -r -nd -np -e robots=off --wait 1 -R "index.html*" "${URL}"

# Check the files
for x in *.jar; do gpg --verify "${x}".asc; done

# EOF

The script can be found at GitHub too: https://github.com/kinow/dork-scripts/tree/3c519a74f28c08310ce2e65f8e860d61fd0c5c07/gpg/asf-sigs

Removing Javadoc SVN Id Tags with Shell Script

kinow @ Sep 13, 2017 16:49:26

Subversion supports Keyword Substitution, which performs substitution of some keywords such as Author, Date, and Id. The Id is the date, time, and user that last modified the file.

It used to be common to all Apache Commons components to have a line as follows in the header of each Java class.

/**
 * SomeClass class.
 *
 * @version $Id$
 */
public class SomeClass {

}

Then the generated Javadoc would contain the date of when the class was altered. Although useful, with proper versioning, it becomes obsolete. It is much more important to know what is the version of the software, not the last time it was modified or by whom. In case you have a problem with that specific file, you can always check the history of the file using git log, or git bisect, or …

Apache Commons components that are migrated to git need to have these lines removed. git does not support these Subversion Keywords so it is never properly rendered. And as every time I have to remove these lines I come up with some shell script snippet, I decided to document the last one I wrote, so that it can save me some time ‐ and perhaps for somebody else too?

find . -name "*.java" -exec sed -i '/^.*\*\s*@version\s*\$Id\$.*$/d' {} \;

And then push a commit with the change :-) In case you know some regex, you can change it and use the same command syntax to remove comments, specific configuration lines, etc.

That’s all. Happy scripting!

♥ Open Source

Finding Base64 implementations in Apache Software Foundation projects

kinow @ Sep 01, 2017 20:23:03

NZ Grey Warbler (riroriro)
New Zealand Grey Warbler (riroriro)

Some time ago while working in one of the many projects in the Apache Software Foundation (Apache Commons FileUpload if I remember well), I noticed that it had a Base64 implementation. What called my attention was that the project not using the Apache Commons Codec Base64 implementation.

While Apache Commons’ mission is to create components that can be re-used across ASF projects, and also by other projects not necessarily under the ASF, it is understandable that some projects prefer to keep its dependencies to a minimum. It is normally a good software engineering practice to carefully manage your dependencies.

But would Apache Commons FileUpload be the only project in the ASF with its own Base64 implementation?

What is Base64?

Simply put, Base64 is a way to encode bytes to strings. It utilises a table, to convert parts of the binary input to certain numbers. These numbers match an entry in the table used by the Base64 implementation. There are several Base64 implementations, though some are obsolete now.

The input text “this is base64!” results in “dGhpcyBpcyBiYXNlNjQh”. It can be decoded and will result in the same input text. An image can also be encoded. Or a ZIP file. This is helpful for data transfer and storage.

Apache Commons Codec is well known to provide a Bse64 implementation, and used in several projects, both Open Source and in the industry. Its implementation is based on the RFC-2045.

Java 8 contains a Base64 implementation, so that may very well replace Apache Commons Coded use in some projects, though that may take some time. The Java 8 implementation supports the RFC-2045, RFC-4648, and has also support to the URL and MIME formats.

Searching for other Base64 implementations

Using GitHub search, I looked for other Base64 implementations in the ASF projects. Here’s the result table with only the custom implementations found after going through some 15 pages in more than 100 pages with hits for “base64”.

Project & link to implementation JVM Base64 implementation
Apache ActiveMQ Artemis 8 RFC-3548, based on http://iharder.net/base64
Apache AsterixDB Hyracks (Incubator) 8 ?
Apache Calcite Avatica 7 RFC-3548, based on http://iharder.net/base64
Apache Cayenne 8 RFC-2045 (based on codec)
Apache Chemistry 7 RFC-3548, based on http://iharder.net/base64
Apache Commons FileUpload 6 ?
Apache Commons Net 6 RFC-2045 (copy of codec?)
Apache Directory Kerby 7 RFC-2045 (copy of codec?)
Apache Felix 5 (?) RFC-2045 (copy of codec?)
Apache HBase 8 RFC-3548, based on http://iharder.net/base64
Apache Jackrabbit 8 (?) ?
Apache James 6 RFC-2045 via javax.mail.internet.MimeUtility
Apache James Mime4J 5 RFC-2045 (based on codec)
Apache OFBiz 8 RFC-2045
Apache Pivot 6 RFC-2045
Apache Qpid 8 ? uses javax.xml.bind.DatatypeConverter#parseBase64Binary()
Apache Shiro 6 RFC-2045 (based on commons)
Apache Tomcat 8 RFC-2045 (copy of codec?)
Apache TomEE 7 RFC-2045
Apache TomEE (Site-NG) 6 RFC-2045
Apache Trafodion (Incubator) 7 RFC-3548, based on http://iharder.net/base64
Apache Wave (Incubator) 7 RFC-3548 (?), based on http://iharder.net/base64

Notes and conclusions

  • Projects using Java 8 can likely remove its own implementation in favour of the new JVM 8 implementation.
  • Some projects were already using Java 8 Base64 implementation.
  • Some projects were using Apache Commons Codec.
  • Some projects were using the Base64 implementation from http://iharder.net/base64, which claims to be very fast. It could be interesting to further investigate it. Perhaps projects where Base64 is used a lot, there could be a significant performance increase by using this version.
  • Even though most of these projects are not using Apache Commons Codec, some have either copied or based their implementations on Apache Commons Codec. Perhaps shading would be more effective? Or maybe adding it as a dependency…
  • I guess the Base64 implementations could be hidden from external users with /*protected*/, private scope. As they are probably not part of Apache Commons Net, or Apache Cayenne public API. Which will be solved eventually after Java 9…
  • Some implementations do not make it clear which RFC or standard they are following. Some derived the reference work (e.g. Apache Wave (Incubator) modified the iharder removing features…).
  • It could be that some of these projects that contain many dependencies like Cayenne and Pivot may even have Commons Codec in the class path as a transitive dependency. If so, it could be interesting to add it as a dependency and remove its own implementation, or simply use Java 8’s.
  • Some implementations like HBase and Trafodion were not using the latest version from http://iharder.net/base64. In the case of HBase and Trafodion, several invalid inputs have been fixed from 2.2.1 to 2.3.7.

Future work

  • I will try to investigate which of these projects that have a custom Base64 implementation and are using Java 8 can be updated to throw away its own version (時間があるときでしょう!).
  • The implementation from http://iharder.net/base64 promises to be very fast, and Apache ActiveMQ Artemis adopted it. We could consider adding a similar fast version to Apache Commons Codec. This could be a reason for keeping its own Base64 implementation.
  • Java 8 Base64 provides Base64, MIME, and URL formats for encode and decoding. Perhaps we could add more formats to Apache Commons Codec too. Even custom formats. This could be a reason for keeping Apache Commons Codec’s implementation.

Happy encoding!

♥ Open Source

Backward compatibility and switch statement with constant expressions

kinow @ Jun 10, 2017 17:35:39

Maintaining Open Source software can be challenging. Making sure you keep backward compatibility (not only binary) can be even more challenging. Apache Commons Lang 3.6 release is happening right now thanks to Benedikt Ritter, and it is on its fourth Release Candidate (i.e. RC4).

A previous RC2 was cancelled due to IBM JDK 8 compatibility, more specifically the lazy initialization of ArrayList’s seems to be different in Oracle JDK and IBM JDK.

The RC3 was cancelled due to a change that could affect users using a switch statement.

The change was in the CharEncoding class. The issue was that this class has some constants, that stopped being constant expressions.

A constant expression is an expression denoting a value of primitive type or a String that does not complete abruptly and is composed using only the following (…)
public class CharEncoding {
    // ...
    public static final String ISO_8859_1 = "ISO-8859-1";
    // ...
}

The code above contains a constant (static final) variable, that is a constant expression. So users can safely use it in switch statements, and the Java compiler won’t complain about it.

public class CharEncoding {
    // ...
    public static final String ISO_8859_1 = StandardCharsets.ISO_8859_1.name();
    // ...
}

The code above is from the change that caused the regression. Any user that was using the ISO_8859_1 constant variable in a switch statement would get a compilation error (e.g. case expressions must be constant expressions) when updating to Apache Commons Lang 3.6. That is because the constant variable is not a constant expression.

I think I learned that some time ago, but if you asked me what was wrong with the change, and if it would break backward compatibility, I would probably fail to spot the issue. There are tools for that now (e.g. Clirr, japicmp) though they may miss some cases too.

Luckily in this case a user subscribed to the Apache Commons development mailing list spotted the issue and quickly reported it. It gets easier maintaining an Open Source project with the constructive feedback of users, like this one.

♥ Open Source

Apache Commons Text LookupTranslator

kinow @ Jun 02, 2017 22:50:39

Apache Commons Text includes several algorithms for text processing. Today’s post is about one of the classes available since the 1.0 release, the LookupTranslator.

It is used to translate text using a lookup table. Most users won’t necessarily be - knowingly - using this class. Most likely, they will use the StringEscapeUtils, which contains methods to escape and unescape CSV, JSON, XML, Java, and EcmaScript.

String original = "He didn't say, \"stop!\"";
String expected = "He didn't say, \\\"stop!\\\"";
String result   = StringEscapeUtils.escapeJava(original);

StringEscapeUtils uses CharSequenceTranslator’s, including LookupTranslator. You can use it directly too, to escape other data your text may contain, special characters not supported by some third party library or system, or even a simpler case.

In other words, you would be creating your own StringEscapeUtils. Let’s say you have some text where numbers must never start with the zero digital, due to some restriction in the way you use that data later.

Map<String, String> lookupTable = new HashMap<>();
lookupTable.put("a", "");
final LookupTranslator escapeNumber0 = new LookupTranslator(new String[][] { {"0", ""} });
String escaped = escapeNumber0.translate("There are 02 texts waiting for analysis today...");

That way the resulting text would be “There are 2 texts waiting for analysis today”, allowing you to proceed with the rest of your analysis. This is a very simple example, but hopefully you grokked how LookupTranslator works.

♥ Open Source