Blog Taxonomy

Posts tagged with 'python'

ImportError when debugging cylc in Eclipse

kinow @ Jul 10, 2018 00:47:13

Since I started reading cylc’s source code in Eclipse to create some sequence diagrams, I have not been able to debug it properly without hitting errors in some part of the program execution.

The error message was “ImportError: cannot import name _remove_dead_weakref”, which was a bit enigmatic as I never heard about that function, but it seemed to be something internal, or at least not from the project code base. And searching the Internet did not help much.

Here is the complete console output in Eclipse.

pydev debugger: starting (pid: 15124)
timeout 10 ps -opid,args 13640  # return 1

            ._.                                                       
            | |            The Cylc Suite Engine [7.7.1-37-g09c8a]    
._____._. ._| |_____.           Copyright (C) 2008-2018 NIWA          
| .___| | | | | .___|  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
| !___| !_! | | !___.  This program comes with ABSOLUTELY NO WARRANTY;
!_____!___. |_!_____!  see `cylc warranty`.  It is free software, you 
      .___! |           are welcome to redistribute it under certain  
      !_____!                conditions; see `cylc conditions`.       
2018-07-10T01:00:47+12 INFO - Suite starting: server=localhost:44444 pid=15124
2018-07-10T01:00:47+12 INFO - Cylc version: 7.7.1-37-g09c8a
2018-07-10T01:00:47+12 INFO - Run mode: live
2018-07-10T01:00:47+12 INFO - Initial point: 1
2018-07-10T01:00:47+12 INFO - Final point: None
2018-07-10T01:00:47+12 INFO - Cold Start 1
2018-07-10T01:00:47+12 DEBUG - [hello.1] -released to the task pool
2018-07-10T01:00:47+12 DEBUG - BEGIN TASK PROCESSING
2018-07-10T01:00:47+12 DEBUG - [hello.1] -waiting => queued
2018-07-10T01:00:47+12 DEBUG - 1 task(s) de-queued
2018-07-10T01:00:47+12 INFO - [hello.1] -submit-num=1, owner@host=localhost
2018-07-10T01:00:47+12 DEBUG - [hello.1] -queued => ready
2018-07-10T01:00:47+12 DEBUG - END TASK PROCESSING (took 0.023609161377 seconds)
2018-07-10T01:00:48+12 DEBUG - ['cylc', 'jobs-submit', '--debug', '--', '/home/kinow/Development/python/workspace/example-suite/log/job', '1/hello/01']
2018-07-10T01:00:48+12 ERROR - [jobs-submit cmd] cylc jobs-submit --debug -- /home/kinow/Development/python/workspace/example-suite/log/job 1/hello/01
    [jobs-submit ret_code] 1
    [jobs-submit err]
    Traceback (most recent call last):
      File "/home/kinow/Development/python/workspace/cylc/bin/cylc-jobs-submit", line 52, in <module>
        from cylc.batch_sys_manager import BatchSysManager
      File "/home/kinow/Development/python/workspace/cylc/lib/cylc/batch_sys_manager.py", line 114, in <module>
        from cylc.task_message import (
      File "/home/kinow/Development/python/workspace/cylc/lib/cylc/task_message.py", line 26, in <module>
        from logging import getLevelName, WARNING, ERROR, CRITICAL
      File "/home/kinow/Development/python/anaconda2/lib/python2.7/logging/__init__.py", line 26, in <module>
        import sys, os, time, cStringIO, traceback, warnings, weakref, collections
      File "/home/kinow/Development/python/anaconda2/lib/python2.7/weakref.py", line 14, in <module>
        from _weakref import (
    ImportError: cannot import name _remove_dead_weakref
2018-07-10T01:00:48+12 ERROR - [jobs-submit cmd] cylc jobs-submit --debug -- /home/kinow/Development/python/workspace/example-suite/log/job 1/hello/01
    [jobs-submit ret_code] 1
    [jobs-submit out] 2018-07-10T01:00:48+12|1/hello/01|1
2018-07-10T01:00:48+12 INFO - [hello.1] -(current:ready) submission failed at 2018-07-10T01:00:48+12
2018-07-10T01:00:48+12 ERROR - [hello.1] -submission failed
2018-07-10T01:00:48+12 DEBUG - [hello.1] -ready => submit-failed
2018-07-10T01:00:48+12 DEBUG - BEGIN TASK PROCESSING
2018-07-10T01:00:48+12 DEBUG - 0 task(s) de-queued
2018-07-10T01:00:48+12 DEBUG - END TASK PROCESSING (took 0.00175499916077 seconds)
2018-07-10T01:00:49+12 WARNING - suite stalled

As the current diagram I am working on has quite a few if‘s and else‘s, I decided to investigate why this error was occurring. Then, after some elimination I found that it was due to the missing Anaconda 2 entry in my $PATH environment variable.

I had this variable configured in a custom script I load whenever I decide to use Anaconda 2. And reproducing the same behaviour in Eclipse was easy.

A screen shot of Eclipse with source code
Locating the bug

Et voilà! Eclipse was happily debugging again!

A screen shot of Eclipse with source code
Locating the bug

So if you have a similar problem, try comparing your environment variables and check if you have some entries missing, and try adding them in Eclipse Debug configuration.

Happy cycling!

A simple cylc suite

kinow @ Jul 08, 2018 18:59:13

I have been writing more suites for cylc lately, and found an example that has proved to be useful for debugging certain parts of the code.

It is an extremely simple suite, similar to what is in cylc’s documentation. It sleeps for N seconds, and prints a message.

What makes it extra simpler, is that it cycles through integers, and has a limit of 1 maximum active points.

It is essentially the same as running the command in your shell session. With the difference that it will run through all cylc’s internal, only once, and allow you to debug and diagnostic parts nor related to cycling and graphs (as for these parts you would probably need a more elaborate example).

[scheduling]
    cycling mode = integer
    initial cycle point = 1
    max active cycle points = 1
    [[dependencies]]
        [[[P1]]]
            graph = "hello"
[runtime]
    [[hello]]
        script = "sleep 10; echo PING"

I also combine this suite with the following global.rc.

[editors] 
    terminal = vim 
    gui = gvim -f

[communication]
    base port = 44444
    method = http
    maximum number of ports = 1

With “base port” set to 44444, and the maximum number of ports to 1, I will be able to run only one task. But that way I can configure Wireshark and other tools to default to 44444/HTTP, for ease of debugging.

Then initialize the suite with something like: cylc start --non-daemon --debug /home/kinow/Development/python/workspace/example-suite/

Happy cycling!

Enabling Markdown Extension Tables For Piecrust

kinow @ Sep 09, 2017 20:35:01

PieCrust is a Python static site generator. It allows users to write content in Markdown. But if you try adding a table, the content by default will be generated as plain text.

You have to enable Markdown extension tables. PieCrust will load it when creating the Markdown instance.

# config.yml
markdown:
  extensions:
    - tables

Et, voilà! Happy blogging!

♥ Open Source

How to remove the signature from e-mails with NLP?

kinow @ Jun 14, 2017 13:59:33

Some time ago I stumbled across EmailParser, a Python utility to remove e-mail signatures. Here’s a sample input e-mail from the project documentation.

Wendy – thanks for the intro! Moving you to bcc.

Hi Vincent – nice to meet you over email. Apologize for the late reply, I was on PTO for a couple weeks and this is my first week back in office. As Wendy mentioned, I am leading an AR/VR taskforce at Foobar Retail Solutions. The goal of the taskforce is to better understand how AR/VR can apply to retail/commerce and if/what is the role of a shopping center in AR/VR applications for retail.

Wendy mentioned that you would be a great person to speak to since you are close to what is going on in this space. Would love to set up some time to chat via phone next week. What does your availability look like on Monday or Wednesday?

Best,
Joe Smith

Joe Smith | Strategy & Business Development
111 Market St. Suite 111| San Francisco, CA 94103
M: 111.111.1111| joe@foobar.com

And here’s what it looks like afterwards.

Wendy – thanks for the intro! Moving you to bcc.

Hi Vincent – nice to meet you over email. Apologize for the late reply, I was on PTO for a couple weeks and this is my first week back in office. As Wendy mentioned, I am leading an AR/VR taskforce at Foobar Retail Solutions. The goal of the taskforce is to better understand how AR/VR can apply to retail/commerce and if/what is the role of a shopping center in AR/VR applications for retail.

Wendy mentioned that you would be a great person to speak to since you are close to what is going on in this space. Would love to set up some time to chat via phone next week. What does your availability look like on Monday or Wednesday?

As you can see, it removed all the lines after the main part of the message (i.e. after the three paragraphs). Here’s what the Python code looks like.

>>> from Parser import read_email, strip, prob_block
>>> from spacy.en import English

>>> pos = English()  # part-of-speech tagger
>>> msg_raw = read_email('emails/test1.txt')
>>> msg_stripped = strip(msg_raw)  # preprocessing text before POS tagging

# iterate through lines, write to file if not signature block
>>> generate_text(msg_stripped, .9, pos_tagger, 'emails/test1_clean.txt')

What got me interested about this utility was the use of NLP. I couldn’t imagine how someone could use NLP for that. And I liked the simplicity of the approach, which is not perfect, but can be useful someday.

After the imports in the code, it creates a Part of Speech tagger using spaCy NLP library, reads the e-mail from a file, and sripts and creates an array with each paragraph of the message.

The magic happens in the generate_text function, which receives the array of paragraphs, a threshold, the POS tagger, and the output destination. Here’s what the function does.

for each message
    if probability ( signature block | message ) < threshold
        write to output file

And the formula for calculating the probability is quite simple too.

1. For a given paragraph (message block), find all the sentences in it.
2. Then for each word (token) in the sentence, count the number of times a non-verb appears.
3. Return the proportion of non-verbs per sentence, i.e. number of non-verbs / number of sentences.

In summary, it discards blocks that do not contain enough verbs to be considered a message block, being treated as signature blocks instead.

Never thought about using an approach like this. It may definitely be helpful when doing data analysis, information retrieval, or scraping data from the web. Not necessarily with e-mails and signatures, but you got the gist of it.

♥ Open Source

Writing a binary parser in Python: NumPy vs. Construct

kinow @ Apr 14, 2017 19:21:03

Some time ago I worked with researchers to write a parser for an old data format. The data was generated by device (radiosonde) using the vendor (Vaisala) specific binary format.

One of the researchers told me someone had written a parser for his work, and shared it on GitHub. To be honest, that was my first time parsing data in binary with Python. Did that before with C, C++, Perl, and Java, but never with Python.

The code on GitHub used NumPy and looked similar to this one.

import numpy as np

parse_header = np.dtype( [ (('field_a', 'b1'), ('field_b', '17b1') ] )

with open('input.dat', 'rb') as f:
    header = np.fromfile(f, dtype=parse_header, count=1)
    # ...

And it indeed worked fine. But in the end I used the code - after contacting the author and letting him know what I was about to do - as reference together with an old specification document for the format, and created a parser with Construct.

From Construct’s website:

Construct is a powerful declarative parser (and builder) for binary data.

This is what the code with construct looked like.

from construct import *

parse_header = Struct("parse_header",
    Enum(Byte("file_ready"),
        READY = 1,
        NOT_READY = 0,
        _default_ = "UNKNOWN",
    ),
    Bytes("reserved", 17)
)

# ...

parse_contents = Struct("parse_contents",
    parse_header,
    Range(mincount=1, maxcout=5, subcon=pre_data),
    OptionalGreedyRange(detailed_data)
)

with open('input.dat', 'rb') as f:
    parse_results = parse_contents.parse_stream(fid)
    # ...

Writing the parser with NumPy or Construct would achieve the same result. However, in the end this came down to personal preference, and my point of view as Software Engineer. This is the description of NumPy.

NumPy is the fundamental package for scientific computing with Python.

NumPy is a project tailored for scientific computing, with a focus on linear algebra, N-dimensional arrays, and so it goes. While it contains code that can parse binary data, the footprint added to a project that includes it as dependency is quite big.

The parser written with NumPy wasn’t using 5% of the NumPy code base. Probably less than 1%. Updates to NumPy could break the application compatibility, even if the update came due to some new matrix operation added to NumPy through some external and missing dependency.

In Java something similar happens with Google Guava. While I use it some times, most of the times I find myself using one of the Apache Commons libraries, or another dependency with just what I need. To avoid including unnecessary code to my application.

If you prefer to use NumPy that’s fine too :-) I just had the time enough to rewrite it instead of using the NumPy (took a couple of hours). In other cases it may still make sense to use another tool or library, even if it was not made specifically for the job ¯\(ツ)

♥ Open Source