Menu

Tag

Posts tagged with ‘programming’

Two other Maven Plug-ins: impsort and deptools

kinow @ Aug 12, 2017 22:16:03

Last week I wrote about the ImmobilienScout24/illegal-transitive-dependency-check rule for Maven Enforcer Plug-in. There are two other Maven Plug-ins that can be useful.

mbknor/deptools

The mbknor/deptools is another rule for the Maven Enforcer Plug-in. It will scan your project dependency tree, looking for transitive dependencies. Whenever it finds a transitive dependency, it will keep track of the versions. And if, because of the way your dependencies and transitive dependencies are organised, you end up with a version that is not the newest, the build will fail.

So, for example, if you have commons-lang3 as transitive dependency of two other dependencies, but one is using 3.4 and the other 3.5. If for any reason you are using 3.4 instead of 3.5, you will have a build error.

Here’s an example of the plug-in configuration.

<project>
  ...
  <build>
    <plugins>
      <plugin>
        <groupId>deptools.plugin</groupId>
        <artifactId>maven-deptools-plugin</artifactId>
        <version>1.3</version>
          <executions>
              <execution>
                  <phase>compile</phase>
                  <goals>
                      <goal>version-checker</goal>
                  </goals>
              </execution>
          </executions>
        </plugin>
    </plugins>
  </build>

  <pluginRepositories>
    <pluginRepository>
      <id>mbk_mvn_repo</id>
      <name>mbk_mvn_repo</name>
      <url>https://raw.githubusercontent.com/mbknor/mbknor.github.com/master/m2repo/releases</url>
    </pluginRepository>
  </pluginRepositories>
  ...
</project>

Running mvn clean verify will execute the Maven Enforcer Plug-in enforce goal, which will call the deptools check. As you may have noticed, you also need to download the plug-in from GitHub, as it is not released to Maven Central.

I do not use it for this reason, and also because I normally spend some time looking at the dependency tree anyway, but every now and then when I work on a new project I like quickly running it just to see what are the dependencies that are being shadowed by older versions.

revelc/impsort-maven-plugin

I only found about this plug-in in the last Apache News Round-up. Where it was mentioned that Apache Accumulo uses this plug-in to standarize the order of imports in code.

<project>
  ...
  <build>
    <plugins>
      <plugin>
        <groupId>net.revelc.code</groupId>
        <artifactId>impsort-maven-plugin</artifactId>
        <version>1.0.0</version>
        <configuration>
          <groups>java.,javax.,org.,com.</groups>
          <staticGroups>java,*</staticGroups>
          <excludes>
            <exclude>**/thrift/*.java</exclude>
          </excludes>
        </configuration>
        <executions>
          <execution>
            <id>sort-imports</id>
            <goals>
              <goal>sort</goal><!-- runs at process-sources phase by default -->
            </goals>
          </execution>
        </executions>
      </plugin>
    </plugins>
  </build>
  ...
</project>

I am neutral on imports order, though in cases where you have several contributors submitting pull requests, it can probably be useful to reduce the number of interactions. In other words, if a user submits a pull request and you have an automated check, then the user would be automatically notified about changes that s/he needs to do in order for his pull request to be accepted.

Conclusion

I am not using any of these two plug-ins, but wanted to save it somewhere in case I needed to use them in the future, and also to share with others. Besides most common plug-ins (PMD, CheckStyle, FindBugs), I normally use at least some Maven Enforcer Plug-in rules, and the OWASP plug-in.

While most of the time we spend writing code, preparing the infrastructure, and deploying and testing, I got bitten by some maven build bugs a few times, and had to spend days/weeks debugging some of these. So hope some of these posts save some hours of someone out there in a similar situation.

♥ Open Source

Checking for transitive dependencies use with Maven Enforcer Plug-in

kinow @ Aug 06, 2017 17:35:39

Maven Enforcer Plug-in “provides goals to control certain environmental constraints such as Maven version, JDK version and OS family along with many more built-in rules and user created rules”. There are several libraries that provide custom rules, or you can write your own.

One of these libraries is ImmobilienScout24/illegal-transitive-dependency-check, “an additional rule for the maven-enforcer-plugin that checks for classes referenced via transitive Maven dependencies”.

With the following example:

<project>
  ...
  <build>
    <plugins>
      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-enforcer-plugin</artifactId>
        <version>1.3.1</version>
        <dependencies>
          <dependency>
            <groupId>de.is24.maven.enforcer.rules</groupId>
            <artifactId>illegal-transitive-dependency-check</artifactId>
            <version>1.7.4</version>
          </dependency>
        </dependencies>
        <executions>
          <execution>
            <id>enforce</id>
            <phase>verify</phase>
            <goals>
              <goal>enforce</goal>
            </goals>
            <configuration>
              <rules>
                <illegalTransitiveDependencyCheck implementation="de.is24.maven.enforcer.rules.IllegalTransitiveDependencyCheck">
                  <reportOnly>false</reportOnly>
                  <useClassesFromLastBuild>true</useClassesFromLastBuild>
                  <suppressTypesFromJavaRuntime>true</suppressTypesFromJavaRuntime>
                  <listMissingArtifacts>false</listMissingArtifacts>
                </illegalTransitiveDependencyCheck>
              </rules>
            </configuration>
          </execution>
        </executions>
      </plugin>
    </plugins>
  </build>
  ...
</project>

Running mvn clean verify will execute the Maven Enforcer Plug-in enforce goal, which will call the illegal transitive dependency check.

And the build will fail if your code is using (i.e. importing) any class that is not available in your first-level dependencies. For example, if in your pom.xml you added commons-lang3 and commons-configuration, the latter which includes commons-lang 2.x, and you used org.apache.commons.lang.StringUtils instead of org.apache.commons.lang3.StringUtils, the build would fail.

In order to fix the build, you have to either add the transitive dependency to your pom.xml file, or correct your import statements. This is specially useful to prevent future issues due to other dependencies being added or updated, and changing the version of the transitive dependency.

Bonus points if you combine that with continuous integration and some service like Travis-CI.

♥ Open Source

How to remove the signature from e-mails with NLP?

kinow @ Jun 14, 2017 13:59:33

Some time ago I stumbled across EmailParser, a Python utility to remove e-mail signatures. Here’s a sample input e-mail from the project documentation.

Wendy – thanks for the intro! Moving you to bcc.

Hi Vincent – nice to meet you over email. Apologize for the late reply, I was on PTO for a couple weeks and this is my first week back in office. As Wendy mentioned, I am leading an AR/VR taskforce at Foobar Retail Solutions. The goal of the taskforce is to better understand how AR/VR can apply to retail/commerce and if/what is the role of a shopping center in AR/VR applications for retail.

Wendy mentioned that you would be a great person to speak to since you are close to what is going on in this space. Would love to set up some time to chat via phone next week. What does your availability look like on Monday or Wednesday?

Best,
Joe Smith

Joe Smith | Strategy & Business Development
111 Market St. Suite 111| San Francisco, CA 94103
M: 111.111.1111| joe@foobar.com

And here’s what it looks like afterwards.

Wendy – thanks for the intro! Moving you to bcc.

Hi Vincent – nice to meet you over email. Apologize for the late reply, I was on PTO for a couple weeks and this is my first week back in office. As Wendy mentioned, I am leading an AR/VR taskforce at Foobar Retail Solutions. The goal of the taskforce is to better understand how AR/VR can apply to retail/commerce and if/what is the role of a shopping center in AR/VR applications for retail.

Wendy mentioned that you would be a great person to speak to since you are close to what is going on in this space. Would love to set up some time to chat via phone next week. What does your availability look like on Monday or Wednesday?

As you can see, it removed all the lines after the main part of the message (i.e. after the three paragraphs). Here’s what the Python code looks like.

>>> from Parser import read_email, strip, prob_block
>>> from spacy.en import English

>>> pos = English()  # part-of-speech tagger
>>> msg_raw = read_email('emails/test1.txt')
>>> msg_stripped = strip(msg_raw)  # preprocessing text before POS tagging

# iterate through lines, write to file if not signature block
>>> generate_text(msg_stripped, .9, pos_tagger, 'emails/test1_clean.txt')

What got me interested about this utility was the use of NLP. I couldn’t imagine how someone could use NLP for that. And I liked the simplicity of the approach, which is not perfect, but can be useful someday.

After the imports in the code, it creates a Part of Speech tagger using spaCy NLP library, reads the e-mail from a file, and sripts and creates an array with each paragraph of the message.

The magic happens in the generate_text function, which receives the array of paragraphs, a threshold, the POS tagger, and the output destination. Here’s what the function does.

for each message
    if probability ( signature block | message ) < threshold
        write to output file

And the formula for calculating the probability is quite simple too.

1. For a given paragraph (message block), find all the sentences in it.
2. Then for each word (token) in the sentence, count the number of times a non-verb appears.
3. Return the proportion of non-verbs per sentence, i.e. number of non-verbs / number of sentences.

In summary, it discards blocks that do not contain enough verbs to be considered a message block, being treated as signature blocks instead.

Never thought about using an approach like this. It may definitely be helpful when doing data analysis, information retrieval, or scraping data from the web. Not necessarily with e-mails and signatures, but you got the gist of it.

♥ Open Source

Backward compatibility and switch statement with constant expressions

kinow @ Jun 10, 2017 17:35:39

Maintaining Open Source software can be challenging. Making sure you keep backward compatibility (not only binary) can be even more challenging. Apache Commons Lang 3.6 release is happening right now thanks to Benedikt Ritter, and it is on its fourth Release Candidate (i.e. RC4).

A previous RC2 was cancelled due to IBM JDK 8 compatibility, more specifically the lazy initialization of ArrayList’s seems to be different in Oracle JDK and IBM JDK.

The RC3 was cancelled due to a change that could affect users using a switch statement.

The change was in the CharEncoding class. The issue was that this class has some constants, that stopped being constant expressions.

A constant expression is an expression denoting a value of primitive type or a String that does not complete abruptly and is composed using only the following (…)
public class CharEncoding {
    // ...
    public static final String ISO_8859_1 = "ISO-8859-1";
    // ...
}

The code above contains a constant (static final) variable, that is a constant expression. So users can safely use it in switch statements, and the Java compiler won’t complain about it.

public class CharEncoding {
    // ...
    public static final String ISO_8859_1 = StandardCharsets.ISO_8859_1.name();
    // ...
}

The code above is from the change that caused the regression. Any user that was using the ISO_8859_1 constant variable in a switch statement would get a compilation error (e.g. case expressions must be constant expressions) when updating to Apache Commons Lang 3.6. That is because the constant variable is not a constant expression.

I think I learned that some time ago, but if you asked me what was wrong with the change, and if it would break backward compatibility, I would probably fail to spot the issue. There are tools for that now (e.g. Clirr, japicmp) though they may miss some cases too.

Luckily in this case a user subscribed to the Apache Commons development mailing list spotted the issue and quickly reported it. It gets easier maintaining an Open Source project with the constructive feedback of users, like this one.

♥ Open Source

Securely using passwords with R

kinow @ Jun 03, 2017 20:21:39

It is quite common to have code that needs to interact with another system, database, or third party application, and need to use some sort of credentials to securely communicate.

Most of the code I wrote in R, or reviewed, had normally no borders (from a system analysis perspective) with other systems, or basically just interacted with the file system to retrieve NetCDF or JSON files.

However, after I saw a comment in Reddit [1] some time ago about this, I decided to check what others used. With R Shiny becoming more popular, and moving more R code to the web, I think this will become a common requirement for R code.

There is a blog post from RevolutionAnalytics [2] that does a nice summary of the options for that. The post is from 2015, but I do not think much changed now in 2017. From the blog post and comments, there are seven methods listed:

  1. Put credentials in your source code
  2. Put credentials in a file in your project, but do not share this file (#3, #4, and #5 are similar to this one)
  3. Put credentials in a .Rprofile file
  4. Put credentials in a .Renviron file
  5. Put credentials in a JSON or YAML file
  6. Put credentials in a secure store that you can access from R
  7. Ask the user for the credentials when the script is executed (possibly not useful for R Shiny applications)

My preferred way for R is the .Renviron (or a dotEnv) file. You basically store your password in this file, make sure you do not share this (a global gitignore could be helpful to prevent any accident) and read the variables when you start your code.

## Secrets
MYSQL_PASSWORD=secret

If you would like to increase the security, you can combine it with a variation of #6. You use a .Renviron file, and use an encryption service like Amazon KMS (KMS stands for Key Management Service).

With AWS KMS in R, you can encrypt your values, put them encrypted in your .Renviron, and even if someone gets hold of your .Renviron file, you have an extra layer of protection, as the attacker would require access to your cloud environment to decrypt it too.

## Secrets
MYSQL_PASSWORD=AQECAHga320J8WadplGCqqVAr4HNvDaFSQ+NaiwIBhmm6qDSFwAAAGIwYAYJKoZIhvcNAQcGoFMwUQIBADBMBgkqhkiG9w0BBwEwHgYJYIZIAWUDBAEuMBEE99+LoLdvYv8l41OhAAIBEIAfx49FFJCLeYrkfMfAw6XlnxP23MmDBdqP8dPp28OoAQ==

References

♥ Open Source