Tag

Posts tagged with ‘java’

Exif Odd Offsets

kinow @ Dec 25, 2017 21:43:33

A file format like JPEG may contain metadata in JFIF, Exif, or a vendor proprietary format. The Exif format is based - or uses parts of - on the TIFF format.

Within an Exif metadata block, you should see directories, with several entries. The entries have fields like description, value, and also an offset. The offset indicates the offset to the next entry.

The Exif specification defines that implementers must make sure to keep the offset an even number, within 4 bytes.

I recently worked on IMAGING-205, a ticket about odd offsets in files with Exif metadata. This issue was exactly to address that when files were rewritten with Apache Commons Imaging, even though the image initially had no odd offsets, after the entries were rearranged, we could have odd offsets.

The fix was simply checking for odd offsets, adding +1, and later it would be put within the 4 bytes limit.

A screen shot of Eclipse with source code
Locating the bug

One interesting point, however, is that this is in the standard, but not all software that read and write Exif follow the specification. So it is quite common to find images with odd offsets.

Which means you could take a picture with your phone, that contains some Exif metadata, and be surprised to analyze it with exiftool and get warnings about odd offsets. Most viewers handle odd and even offsets, so it should work for most cases, unless you have a strict reader/viewer.

Happy hacking!

&heart; Open Source

Remember to synchronize when iterating streams from a synchronized Collection

kinow @ Dec 03, 2017 23:56:13

When iterating collections created via Collections.synchronizedList for instance, you are required to obtain a lock on the actual list before doing so. So you normally end up with code similar to:

List list = Collections.synchronizedList(new ArrayList());
synchronized (list) {
  Iterator i = list.iterator(); // Must be in synchronized block
  while (i.hasNext())
      foo(i.next());
}

This requirement is documented in the javadocs.

Since lambdas and streams are being more widely used, it is important to remind that when iterating via a stream we also need to obtain a lock on the synchronized collection created.

List list = Collections.synchronizedList(new ArrayList());
synchronized (list) {
  list.stream()
    .anyMatch(...)
}

Here’s an example from Zalando Nakadi Event Broker.

Happy hacking!

Watch out for Locales when using NumberFormat with currencies

kinow @ Dec 02, 2017 22:51:00

In Java you have the NumberFormatException to help you formatting and parsing numbers for any locale. Said that, here’s some code.

BigDecimal negative = new BigDecimal("-1234.56");

DecimalFormat nf = (DecimalFormat) NumberFormat.getCurrencyInstance(Locale.UK);
String formattedNegative = nf.format(negative);

System.out.println(formattedNegative);

The output for this code is -£1,234.56. That’s expected, as the locale is set to UK, so the currency symbol used is for British Pounds. And as the number is negative, you get that minus sign as a prefix. For Japanese locale you’d get -¥1,235, and for Brazilian locale you’d get -R$ 1.234,56.

So far so good.

What about the following code, with nothing different except for the locale set to US.

BigDecimal negative = new BigDecimal("-1234.56");

DecimalFormat nf = (DecimalFormat) NumberFormat.getCurrencyInstance(Locale.US); // <--- US now
String formattedNegative = nf.format(negative);

System.out.println(formattedNegative);

Some could intuitively expect -$1,234.56. However, the output is actually ($1,234.56).

There are different prefixes and suffixes. But in some locales the prefix can be empty, or, as in the case of the US locale, it can be quite different than what you could expect.

Learned about this peculiarity from NumberFormat while working on VALIDATOR-433 for Apache Commons Validator.

Happy hacking!

Using formatter exclusions with Eclipse

kinow @ Nov 06, 2017 21:56:56

Sometimes when you are formatting your code in Eclipse, you may want to prevent some parts of the code from being formatted. Especially when using Java 8 lambdas and optionals.

Here’s some code before being formatted by Eclipse’s default formatter rules.

Code adapted from: blog post Java d’eau ‐ Java 8: Streams in Hibernate and Beyond

session.createQuery("SELECT h FROM Hare h", Hare.class)
    .stream()
    .filter(h -> h.getId() == 1)
    .map(Hare::getName)
    .forEach(System.out::println);

Then after formatting.

session.createQuery("SELECT h FROM Hare h", Hare.class).stream().filter(h -> h.getId() == 1).map(Hare::getName)
                .forEach(System.out::println);

Which doesn’t look very appealing, ay? You can change this behaviour at least in two ways. The first by telling the formatter to ignore this block, through a special formatter tag in your code.

First you need to enable this feature in Eclipse, as it is disabled by default. This setting is found in the preferences JavaCode StyleFormatterEditOff/On Tags.

A screen shot of Eclipse formatter settings
Enabling formatter tags in Eclipse

Then formatting the following code won’t change a thing in the block surrounded by the formatter tags.

/* @Formatter:off */
session.createQuery("SELECT h FROM Hare h", Hare.class)
    .stream()
    .filter(h -> h.getId() == 1)
    .map(Hare::getName)
    .forEach(System.out::println);
/* @Formatter:on */

But having to type these tags can become annoying, and cause more commits and pull requests to be unnecessarily created. So an alternative approach can be to change the formatter behaviour globally.

This can be done in Eclipse in another option under the formatter options, JavaCode StyleFormatterEditLine WrappingFunction CallsQualified invocations.

You will have to choose “Wrap all elements, except first element if not necessary” under Line wrapping policy. And also check “Force split, even if line shorter than maximum line width”.

A screen shot of Eclipse formatter settings
Enabling custom formatter behaviour globally

Once it is done, your code will look like the following no matter what.

session.createQuery("SELECT h FROM Hare h", Hare.class)
    .stream()
    .filter(h -> h.getId() == 1)
    .map(Hare::getName)
    .forEach(System.out::println);

Happy coding!

♥ Open Source

Finding Base64 implementations in Apache Software Foundation projects

kinow @ Sep 01, 2017 20:23:03

NZ Grey Warbler (riroriro)
New Zealand Grey Warbler (riroriro)

Some time ago while working in one of the many projects in the Apache Software Foundation (Apache Commons FileUpload if I remember well), I noticed that it had a Base64 implementation. What called my attention was that the project not using the Apache Commons Codec Base64 implementation.

While Apache Commons’ mission is to create components that can be re-used across ASF projects, and also by other projects not necessarily under the ASF, it is understandable that some projects prefer to keep its dependencies to a minimum. It is normally a good software engineering practice to carefully manage your dependencies.

But would Apache Commons FileUpload be the only project in the ASF with its own Base64 implementation?

What is Base64?

Simply put, Base64 is a way to encode bytes to strings. It utilises a table, to convert parts of the binary input to certain numbers. These numbers match an entry in the table used by the Base64 implementation. There are several Base64 implementations, though some are obsolete now.

The input text “this is base64!” results in “dGhpcyBpcyBiYXNlNjQh”. It can be decoded and will result in the same input text. An image can also be encoded. Or a ZIP file. This is helpful for data transfer and storage.

Apache Commons Codec is well known to provide a Bse64 implementation, and used in several projects, both Open Source and in the industry. Its implementation is based on the RFC-2045.

Java 8 contains a Base64 implementation, so that may very well replace Apache Commons Coded use in some projects, though that may take some time. The Java 8 implementation supports the RFC-2045, RFC-4648, and has also support to the URL and MIME formats.

Searching for other Base64 implementations

Using GitHub search, I looked for other Base64 implementations in the ASF projects. Here’s the result table with only the custom implementations found after going through some 15 pages in more than 100 pages with hits for “base64”.

Project & link to implementation JVM Base64 implementation
Apache ActiveMQ Artemis 8 RFC-3548, based on http://iharder.net/base64
Apache AsterixDB Hyracks (Incubator) 8 ?
Apache Calcite Avatica 7 RFC-3548, based on http://iharder.net/base64
Apache Cayenne 8 RFC-2045 (based on codec)
Apache Chemistry 7 RFC-3548, based on http://iharder.net/base64
Apache Commons FileUpload 6 ?
Apache Commons Net 6 RFC-2045 (copy of codec?)
Apache Directory Kerby 7 RFC-2045 (copy of codec?)
Apache Felix 5 (?) RFC-2045 (copy of codec?)
Apache HBase 8 RFC-3548, based on http://iharder.net/base64
Apache Jackrabbit 8 (?) ?
Apache James 6 RFC-2045 via javax.mail.internet.MimeUtility
Apache James Mime4J 5 RFC-2045 (based on codec)
Apache OFBiz 8 RFC-2045
Apache Pivot 6 RFC-2045
Apache Qpid 8 ? uses javax.xml.bind.DatatypeConverter#parseBase64Binary()
Apache Shiro 6 RFC-2045 (based on commons)
Apache Tomcat 8 RFC-2045 (copy of codec?)
Apache TomEE 7 RFC-2045
Apache TomEE (Site-NG) 6 RFC-2045
Apache Trafodion (Incubator) 7 RFC-3548, based on http://iharder.net/base64
Apache Wave (Incubator) 7 RFC-3548 (?), based on http://iharder.net/base64

Notes and conclusions

  • Projects using Java 8 can likely remove its own implementation in favour of the new JVM 8 implementation.
  • Some projects were already using Java 8 Base64 implementation.
  • Some projects were using Apache Commons Codec.
  • Some projects were using the Base64 implementation from http://iharder.net/base64, which claims to be very fast. It could be interesting to further investigate it. Perhaps projects where Base64 is used a lot, there could be a significant performance increase by using this version.
  • Even though most of these projects are not using Apache Commons Codec, some have either copied or based their implementations on Apache Commons Codec. Perhaps shading would be more effective? Or maybe adding it as a dependency…
  • I guess the Base64 implementations could be hidden from external users with /*protected*/, private scope. As they are probably not part of Apache Commons Net, or Apache Cayenne public API. Which will be solved eventually after Java 9…
  • Some implementations do not make it clear which RFC or standard they are following. Some derived the reference work (e.g. Apache Wave (Incubator) modified the iharder removing features…).
  • It could be that some of these projects that contain many dependencies like Cayenne and Pivot may even have Commons Codec in the class path as a transitive dependency. If so, it could be interesting to add it as a dependency and remove its own implementation, or simply use Java 8’s.
  • Some implementations like HBase and Trafodion were not using the latest version from http://iharder.net/base64. In the case of HBase and Trafodion, several invalid inputs have been fixed from 2.2.1 to 2.3.7.

Future work

  • I will try to investigate which of these projects that have a custom Base64 implementation and are using Java 8 can be updated to throw away its own version (時間があるときでしょう!).
  • The implementation from http://iharder.net/base64 promises to be very fast, and Apache ActiveMQ Artemis adopted it. We could consider adding a similar fast version to Apache Commons Codec. This could be a reason for keeping its own Base64 implementation.
  • Java 8 Base64 provides Base64, MIME, and URL formats for encode and decoding. Perhaps we could add more formats to Apache Commons Codec too. Even custom formats. This could be a reason for keeping Apache Commons Codec’s implementation.

Happy encoding!

♥ Open Source