Posts about technology and arts.
Maven Enforcer Plug-in “provides goals to control certain environmental constraints such as Maven version, JDK version and OS family along with many more built-in rules and user created rules”. There are several libraries that provide custom rules, or you can write your own.
One of these libraries is ImmobilienScout24/illegal-transitive-dependency-check, “an additional rule for the maven-enforcer-plugin that checks for classes referenced via transitive Maven dependencies”.
With the following example:
<project> ... <build> <plugins> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-enforcer-plugin</artifactId> <version>1.3.1</version> <dependencies> <dependency> <groupId>de.is24.maven.enforcer.rules</groupId> <artifactId>illegal-transitive-dependency-check</artifactId> <version>1.7.4</version> </dependency> </dependencies> <executions> <execution> <id>enforce</id> <phase>verify</phase> <goals> <goal>enforce</goal> </goals> <configuration> <rules> <illegalTransitiveDependencyCheck implementation="de.is24.maven.enforcer.rules.IllegalTransitiveDependencyCheck"> <reportOnly>false</reportOnly> <useClassesFromLastBuild>true</useClassesFromLastBuild> <suppressTypesFromJavaRuntime>true</suppressTypesFromJavaRuntime> <listMissingArtifacts>false</listMissingArtifacts> </illegalTransitiveDependencyCheck> </rules> </configuration> </execution> </executions> </plugin> </plugins> </build> ... </project>
mvn clean verify will execute the Maven Enforcer Plug-in
enforce goal, which will call the illegal transitive dependency check.
And the build will fail if your code is using (i.e. importing) any class that is not available in your first-level dependencies. For example, if in your pom.xml you added
commons-configuration, the latter which includes
commons-lang 2.x, and you used
org.apache.commons.lang.StringUtils instead of
org.apache.commons.lang3.StringUtils, the build would fail.
In order to fix the build, you have to either add the transitive dependency to your pom.xml file, or correct your import statements. This is specially useful to prevent future issues due to other dependencies being added or updated, and changing the version of the transitive dependency.
Bonus points if you combine that with continuous integration and some service like Travis-CI.
♥ Open Source
My favourite Star Trek captain: Jean-Luc Picard.
“Space, the final frontier. These are the voyages of the starship Enterprise. Its continuing mission: to explore strange new worlds, to seek out new life and new civilizations, to boldly go where no one has gone before!”
2B pencil, rubber eraser, blending stump, and HB 0.3 mechanical pencil. Yellow-ish layer added adjusting LAB colour space in GIMP.
An Introduction to the International Image Interoperability Framework (IIIF)
Some time ago I stumbled across EmailParser, a Python utility to remove e-mail signatures. Here’s a sample input e-mail from the project documentation.
Wendy – thanks for the intro! Moving you to bcc. Hi Vincent – nice to meet you over email. Apologize for the late reply, I was on PTO for a couple weeks and this is my first week back in office. As Wendy mentioned, I am leading an AR/VR taskforce at Foobar Retail Solutions. The goal of the taskforce is to better understand how AR/VR can apply to retail/commerce and if/what is the role of a shopping center in AR/VR applications for retail. Wendy mentioned that you would be a great person to speak to since you are close to what is going on in this space. Would love to set up some time to chat via phone next week. What does your availability look like on Monday or Wednesday? Best, Joe Smith Joe Smith | Strategy & Business Development 111 Market St. Suite 111| San Francisco, CA 94103 M: 111.111.1111| firstname.lastname@example.org
And here’s what it looks like afterwards.
Wendy – thanks for the intro! Moving you to bcc. Hi Vincent – nice to meet you over email. Apologize for the late reply, I was on PTO for a couple weeks and this is my first week back in office. As Wendy mentioned, I am leading an AR/VR taskforce at Foobar Retail Solutions. The goal of the taskforce is to better understand how AR/VR can apply to retail/commerce and if/what is the role of a shopping center in AR/VR applications for retail. Wendy mentioned that you would be a great person to speak to since you are close to what is going on in this space. Would love to set up some time to chat via phone next week. What does your availability look like on Monday or Wednesday?
As you can see, it removed all the lines after the main part of the message (i.e. after the three paragraphs). Here’s what the Python code looks like.
>>> from Parser import read_email, strip, prob_block >>> from spacy.en import English >>> pos = English() # part-of-speech tagger >>> msg_raw = read_email('emails/test1.txt') >>> msg_stripped = strip(msg_raw) # preprocessing text before POS tagging # iterate through lines, write to file if not signature block >>> generate_text(msg_stripped, .9, pos_tagger, 'emails/test1_clean.txt')
What got me interested about this utility was the use of NLP. I couldn’t imagine how someone could use NLP for that. And I liked the simplicity of the approach, which is not perfect, but can be useful someday.
The magic happens in the
generate_text function, which receives the array of paragraphs, a threshold, the POS tagger, and the output destination. Here’s what the function does.
for each message if probability ( signature block | message ) < threshold write to output file
And the formula for calculating the probability is quite simple too.
1. For a given paragraph (message block), find all the sentences in it. 2. Then for each word (token) in the sentence, count the number of times a non-verb appears. 3. Return the proportion of non-verbs per sentence, i.e. number of non-verbs / number of sentences.
In summary, it discards blocks that do not contain enough verbs to be considered a message block, being treated as signature blocks instead.
Never thought about using an approach like this. It may definitely be helpful when doing data analysis, information retrieval, or scraping data from the web. Not necessarily with e-mails and signatures, but you got the gist of it.
♥ Open Source