Strings transliteration in Java with Apache Commons Lang

Rosalind is a website with a curated set of exercices for bioinformatics, organized hierarchily. In some of these examples you are required to replace characters (nucleotides) by other characters. It is a rather common task for developers, like when you need to replace special characters in user’s names.

There are different ways of describing it, such as translate, replace, or transliterate. The latter being my favorite definition.

In Python I know that there are several different ways of transliterating strings [1][2]. But in Java I always ended up using a Map or a Enum and writing my own method in some Util class for that.

Turns out that Apache Commons Lang, which I use in most of my projects, provided this feature. What means that I will be able to reduce the length of my code, what also means less code to be tested (and one less place to look for bugs).

String s = StringUtils.replaceChars("ATGCATGC", "GTCA", "CAGT"); // "TACGTACG"
System.out.println(s);

What the code above does, is replace G by C, T by A, C by G and A by T. This process is part of finding the DNA reverse complement. But you can also use this for replacing special characters, spaces by _, and so it goes.

Happy hacking!

Categories: Blog

Tags: Java, Apache Software Foundation, Bioinformatics