Menu

Tag

Posts tagged with ‘java’

Some links related to Apache Commons Text

kinow @ May 28, 2017 19:50:39

Apache Commons Text is one of the most recent new components in Apache Commons. It “is a library focused on algorithms working on strings”. I recently collected some links under a bookmark folder that are in some way related to the project. In case you are interested, check some of the links below.

  • Morgan Wahl Text is More Complicated Than You Think Comparing and Sorting Unicode PyCon 2017
    • Q: test [text] to check if our methods are OK with some examples in this talk)
    • Q: Canonical Decomposition, and code points comparisons; are we doing it? Are we doing it right?
    • Q: Do we have casefolding?
    • Q: Do we have multi-level sort?
    • Q: CLDR
  • Ɓukasz Langa Unicode what is the big deal PyCon 2017
    • Q: Quite sure we have an issue to guess the encoding for a text…. there is a GPL library for that? Under Mozilla perhaps?
  • Jiaqi Liu Fuzzy Search Algorithms How and When to Use Them PyCon 2017
    • Q: Does OpenNLP have N-GRAM’s? Would it make sense to have that in [text]?
    • Q: Where can we find some tokenizers? OpenNLP?
  • Lothaire’s Books like “Combinatorics on Words” and “Algebraic Combinatorics”.
  • Java tutorial lesson “Working with Text”
  • Mitzi Morris’ Text Processing in Java book
  • StringSearch java library.
    • “The Java language lacks fast string searching algorithms. StringSearch provides implementations of the Boyer-Moore and the Shift-Or (bit-parallel) algorithms. These algorithms are easily five to ten times faster than the naïve implementation found in java.lang.String”.
  • Jakarta Oro (attic)
    • The Jakarta-ORO Java classes are a set of text-processing Java classes that provide Perl5 compatible regular expressions, AWK-like regular expressions, glob expressions, and utility classes for performing substitutions, splits, filtering filenames, etc. This library is the successor to the OROMatcher, AwkTools, PerlTools, and TextTools libraries originally from ORO, Inc. Despite little activity in the form of new development initiatives, issue reports, questions, and suggestions are responded to quickly.
    • Discontinued, but is there anything useful in there? The attic has always interesting things after all…
  • TextProcessing blog - A Text Processing Portal for Humans
  • twitter-text, the Twitter Java (multi-language actually…) text processing library.
  • Python’s text modules

♥ Open Source

When you don’t realize you need a Comparable

kinow @ May 15, 2017 23:07:39

In 2012, I wrote about how you always learn something new by following the Apache dev mailing lists.

After about five years, I am still learning, and still getting impressed by the knowledge of other developers. Days ago I was massaging some code in a pull request and a developer suggested me to simplify my code.

The suggestion was to make a class a Comparable type to both simplify the code, and also have a better design. I immediately agreed, and looking back in hindsight, it was the most logical choice. Yet, I simply did not think about that.

// What the code was
case VSPACE_SORTKEY :
    int cmp = 0;
    String c1 = nv1.getCollation();
    String c2 = nv2.getCollation();
    if (c1 != null 
        Collator collator = Collator.getInstance(desiredLocale);
        cmp = collator.compare(nv1.getString(), nv2.getString());
    } else {
        cmp = XSDFuncOp.compareString(nv1, nv2) ;
    }
    return cmp;
}
// What the code is now
case VSPACE_SORTKEY :
    return ((NodeValueSortKey) nv1).compareTo((NodeValueSortKey) nv2);
}

This moved the logic to a method in the NodeValueSortKey class. This reduced the complexity of the class with the switch statement. And it also made it easier to write unit tests.

If you are not involved in Open Source projects yet, I keep my suggestion from five years ago. Find a project related to something you like, and start reading the code, lurk in the mailing list or watch GitHub repositories.

You can always learn more!

♥ Open Source

Troubleshooting a Jenkins Plug-in compatibility issue

kinow @ Apr 17, 2017 21:01:03

This post is probably different from others. I will give a TL;DR, but will then give you a copy of a comment of a Jenkins JIRA issue. Hope you have fun reading it, specially if you maintain Jenkins servers or plug-ins.

TL;DR: there was an issue in Jenkins Job DSL Plug-in, that caused jobs created to have an invalid script. The fix had not been released, but was already in the master branch in GitHub.

Well, that was fun bug reproduced, I believe I know why that’s happening. Not so complicated to fix… but no fast way to fix it. Here’s the issue analysis (grab a coffee to read it).

  • Downloaded jenkins.war (2.32.3.war)
  • mkdir /tmp/123
  • JENKINS_HOME=/tmp/123 java -jar jenkins.war
  • Entered secret into form and submitted
  • Installed suggested plugins (boy that takes a while)
  • Created temp user
  • Installed (without restart) active-choices-plugin 1.5.3
  • Installed (without restart) job-dsl-plugin
  • Manually stopped Jenkins, and started it again with same command #3
  • Log in with user, all looking good
  • Created Freestyle job JENKINS-42655 (see attached config.xml)
  • Executed job, and found new job JENKINS-42655-1
  • Never opened the job configuration, clicked on the “Generated Items link to JENKINS-42655-1” to open in a new tab
  • Clicked on Build with Parameters
  • Looked at logs, and noticed the security-script-plugin exceptions
  • Went to “Manage Jenkins” / “In-process Script Approval” and approved scripts
  • Went back to the JENKINS-42655-1 build with parameters screen, and everything worked as expected

*Hummm. Issue more or less reproduced. Let’s investigate more.

  • Restarted Jenkins again
  • Changed the JENKINS-42655 seed job configuration to use a different script
  • Copied the config.xml file to another location
  • Went to build with parameters, and now it was broken again
  • Saved the job manually
  • Copied the config.xml file to yet another location
  • Went to build with parameters, and now it worked as reported in this issue

Now comes the interesting part. Looking at the diff. Attaching a screen shot so that others can have fun looking at it too. I installed Kompare as it has some cool features such as disabling diff for white spaces, blank lines, etc. The whole file changes as you save it. But if you ignore the number of white spaces… Then you can see that the Job DSL Plug-in is creating a <script> tag, as we used to do before 1.5 I think.

Screen shot
Screen shot

Now we use the script-security-plugin. So we need to wrap that around the script-security-plugin’s tags. Will report an issue for job-dsl-plugin, and will probably submit a pull request in the next days too. There’s not much left we can change in the plug-in code for that Piotr Tempes, so I’m afraid you will have to:

  • keep saving the job
  • perhaps work on the fix for Job DSL if you feel like doing it (as you could probably be faster than me in submitting the PR)
  • use an older version of the active-choices-plugin that doesn’t use security-script-plugin, though you could be bitten by other old bugs
  • write some script to replace the <script> tag and wrap it by the secureScript (doing what the pull request will do automatically later)

Sorry for not being able to quickly provide any workaround, nor to cut a quick bugfix release.

Cheers Bruno

ps: When I started getting involved in Open Source, I always felt extremely happy when people would spend their time troubleshooting my issues, or just educating me on how to behave in an Open Source community, or even how I should have troubleshooted initially the issue myself. So whenever I have time, I try to write detailed and polite replies. The person posting the comment is my past. The person posting the comment is my future :-) Kindness begets kindness.

♥ Open Source

Spring Cloud encrypted values and Spring PropertySources

kinow @ Apr 14, 2017 11:21:03

As I could not find any documentation for that, I decided to write it as a note to myself in case I use the encryption and decryption with Spring Cloud again.

In Spring and Spring Boot, you normally have multiple sources of properties, like multiple properties files, environment properties and variables, and so it goes. In the Spring API, these are represented as PropertySource‘s.

In a Spring Boot application, you would be used to overriding certain properties by defining environments and using an application-production.properties file, or overriding values with environment properties.

This is common in Spring Boot applications deployed to Amazon Elastic Beanstalk.

Some time ago another team at work found that overriding did not always work when you have encrypted values in your properties files. Even if you specified new values in the Amazon Elastic Beanstalk application configuration.

Yesterday, while debugging the issue and reading Spring Cloud source code, I found its EnvironmentDecryptApplicationInitializer.

It basically iterates through all loaded property sources, looking for values that start with {cipher}. Then it calls the Spring Security TextEncryptor defined in the application.

Finally, it creates a new property source, called decrypted, with the decrypted values. So when your application looks for a property called XPTO, and if it has been encrypted, it will find the value in the decrypted propery source, regardless of whether you tried to override it or not.

# Property sources listed in Eclipse IDE

[
  servletConfigInitParams,
  servletContextInitParams,
  systemProperties,
  systemEnvironment,
  random,
  applicationConfigurationProperties,
  springCloudClientHostInfo,
  defaultProperties
]

# When using encrypted values

[
  decrypted, <-------- created by Spring Cloud, with decrypted values. Prepended to the list of property sources
  servletConfigInitParams,
  servletContextInitParams,
  systemProperties,
  systemEnvironment,
  random,
  applicationConfigurationProperties,
  springCloudClientHostInfo,
  defaultProperties
]

So in case you have encrypted values in your Spring application (and you are using Spring Cloud, of course) remember that these values will have higher priority, and can only be overriden by other encrypted values.

♥ Open Source

Apache Commons Lang: Memoizer

kinow @ Jan 08, 2017 18:34:03

The current release of Apache Commons Lang is 3.5. The upcoming release, probably 3.6, will include a new feature, added in a pull request: a Memoizer implementation. Check out the ticket LANG-740 for more about the implementation being added to [lang].

The book Java Concurrency in Practice introduces readers to the Memoizer, and has also a public domain implementation available for download (besides that, the book has also lots of other interesting topics!).

In summary, Memoizer is a simple cache, that will store the result of a computation. It receives a Computable object, responsible for doing something that will be stored by the Memoizer. Here’s a simple code to illustrate how that will work in your Java code.

// Computation to be stored in the cache
Computable<String, String> getFormattedCurrentDate = new Computable<String, String>() {
    @Override
    public String compute(String fmt) throws InterruptedException {
        return new SimpleDateFormat(fmt).format(new Date());
    }
};

// Our memoizer
Memoizer<String, String> dateCache = new Memoizer<>(getFormattedCurrentDate);

// To illustrate its use
for (int i = 0; i < 10; i++) {
    try {
        // S -> Millisecond
        System.out.println(dateCache.compute("HH:mm:ss:S Z dd/MM/YYYY"));
        // Regardless of this sleep call, we get the same result every iteration
        Thread.sleep(1500);
    } catch (InterruptedException e) {
        e.printStackTrace();
    }
}

The computable created (getFormattedCurrentDate) will be called only once, and stored in a map. The parameter passed in the #compute() method will be used as key in the map. So choose your parameter wisely :-) The output of the example will be similar to the following one.

19:15:57:854 +1300 08/01/2017
19:15:57:854 +1300 08/01/2017
19:15:57:854 +1300 08/01/2017
19:15:57:854 +1300 08/01/2017
19:15:57:854 +1300 08/01/2017
19:15:57:854 +1300 08/01/2017
19:15:57:854 +1300 08/01/2017
19:15:57:854 +1300 08/01/2017
19:15:57:854 +1300 08/01/2017
19:15:57:854 +1300 08/01/2017

In the example above I used a for-loop to illustrate what will happen. Even though we call the memoizer #compute() method several times, followed by Thread#sleep(); only one result, the first to be computed, will be returned.

So that’s all for today. Hope you learned something about this new class, that must be available in the next release of Apache Commons Lang.

Happy hacking!

ps: [lang] uses Java 7, so that is why we do not have a functional instead of the Comparable