Blog Taxonomy

Posts tagged with 'cylc'

Cylc Scheduler Internals - Part 3

kinow @ Aug 18, 2018 18:27:37

This is part 2, in a series of posts about Cylc internals. The part 1 had the beginning of the workflow. part 2 documented from the moment the method configure() is called. This post will continue right after the continue() method returns, going on with the next method: run().

NB: this is a post to remember things, not really expecting to give someone enough information to be able to hack the Cylc Scheduler (though you can and would have fun!).

At this point, the Suite Server program must have been initialized, with the objects that it requires, and with everything configured. So this method is the one that starts the whole work on the tasks & proxies.

The runahead points, i.e. what are the next available cycle points, are calculated and scheduled. In the scheduling, tasks are queued for execution.

Most of the interesting action takes place when process_task_pool() is called, and in the submit_task_jobs(). The latter is a method from TaskJobManager, and it is here where - in this case - my shell command is executed through a temporary shell script file.

The graph was created from the initial execution of a suite that was starting from scratch (they can also be reinitialized). If there are multiple tasks waiting, or if a suite was restarted, the diagram would look considerably different.

You can download the source file for the diagram used in this post, and edit it with draw.io.

Creating a Docker container to run as a command

kinow @ Aug 04, 2018 22:00:36

For the past two weeks at work I have been assigned to work on PHP projects. Though I used PHP some time ago - especially with Code Igniter and Laravel - I have not used it in a few years. And have been doing mostly Java nowadays.

The complete project setup was done by co-workers. I had a PHP project, using Symfony, several bundles and libraries, and Postgres. But it required just running a few commands to set up AWS settings, and then fire up Docker Compose.

Besides Docker Compose starting a web container, and another Postgres container, we used the web container to run Composer. In the past, I would see the PHP project mapped as a folder in the running container, and then composer would be executed in the host.

I noticed then that I could do the same to an image I created in 2016 to run Cylc. The previous version would be used to start a container in a terminal, then run some commands while maintaining the state within the container.

The new version now maps a volume to the version of cylc, installs the dependencies, and then allows the state to be maintained solely in the mapped volume.

Furthermore, the entrypoint of the image is the cylc command. Which means that you do not have to execute a terminal in order to run commands to the container (or change the command/entrypoint).

...
VOLUME "/opt/cylc" "/tmp" "/run" "/var/run" # we have the state stored only in /opt/cylc
WORKDIR "/opt/cylc"
ENV PATH /opt/cylc/bin:$PATH
WORKDIR /opt/cylc
ENTRYPOINT ["cylc"] # here's the trick!

It uses the same approach from the PHP image I am using for Composer, with a few additional improvements from the Best practices for writing Dockerfiles documentation.

$ docker run -ti -v "$(pwd -P)"/cylc:/opt/cylc cylc validate /opt/cylc/etc/examples/tutorial/oneoff/basic/
Valid for cylc-7.7.2
$ docker run -ti -v "$(pwd -P)"/cylc:/opt/cylc cylc register /opt/cylc/etc/examples/tutorial/oneoff/basic/
REGISTER /opt/cylc/etc/examples/tutorial/oneoff/basic/
$ docker run -ti -v "$(pwd -P)"/cylc:/opt/cylc cylc start --non-daemon --debug /opt/cylc/etc/examples/tutorial/oneoff/basic/
            ._.                                                       
            | |                 The Cylc Suite Engine [7.7.2]         
._____._. ._| |_____.           Copyright (C) 2008-2018 NIWA          
| .___| | | | | .___|  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
| !___| !_! | | !___.  This program comes with ABSOLUTELY NO WARRANTY;
!_____!___. |_!_____!  see `cylc warranty`.  It is free software, you 
      .___! |           are welcome to redistribute it under certain  
      !_____!                conditions; see `cylc conditions`.       
2018-07-30T12:10:32Z INFO - Suite starting: server=3714d1a17999:43040 pid=1
2018-07-30T12:10:32Z INFO - Cylc version: 7.7.2
2018-07-30T12:10:32Z INFO - Run mode: live
2018-07-30T12:10:32Z INFO - Initial point: 1
2018-07-30T12:10:32Z INFO - Final point: 1
2018-07-30T12:10:32Z INFO - Cold Start 1
2018-07-30T12:10:32Z DEBUG - [hello.1] -released to the task pool
2018-07-30T12:10:32Z DEBUG - BEGIN TASK PROCESSING
2018-07-30T12:10:32Z DEBUG - [hello.1] -waiting => queued
2018-07-30T12:10:32Z DEBUG - 1 task(s) de-queued
2018-07-30T12:10:32Z INFO - [hello.1] -submit-num=1, owner@host=localhost
2018-07-30T12:10:32Z DEBUG - [hello.1] -queued => ready
2018-07-30T12:10:32Z DEBUG - END TASK PROCESSING (took 0.00642895698547 seconds)
2018-07-30T12:10:32Z DEBUG - ['cylc', 'jobs-submit', '--debug', '--', '/opt/cylc/etc/examples/tutorial/oneoff/basic/log/job', '1/hello/01']
2018-07-30T12:10:33Z DEBUG - [client-connect] root@3714d1a17999:cylc-message privilege='full-control' 7b05596a-5971-4733-82de-28528f702ff0
2018-07-30T12:10:33Z INFO - [client-command] put_messages root@3714d1a17999:cylc-message 7b05596a-5971-4733-82de-28528f702ff0
2018-07-30T12:10:33Z DEBUG - [jobs-submit cmd] cylc jobs-submit --debug -- /opt/cylc/etc/examples/tutorial/oneoff/basic/log/job 1/hello/01
    [jobs-submit ret_code] 0
    [jobs-submit out]
    [TASK JOB SUMMARY]2018-07-30T12:10:32Z|1/hello/01|0|61
    [TASK JOB COMMAND]2018-07-30T12:10:32Z|1/hello/01|[STDOUT] 61
2018-07-30T12:10:33Z INFO - [hello.1] -(current:ready) submitted at 2018-07-30T12:10:32Z
2018-07-30T12:10:33Z INFO - [hello.1] -job[01] submitted to localhost:background[61]
2018-07-30T12:10:33Z DEBUG - [hello.1] -ready => submitted
2018-07-30T12:10:33Z INFO - [hello.1] -health check settings: submission timeout=None
2018-07-30T12:10:33Z DEBUG - BEGIN TASK PROCESSING
2018-07-30T12:10:33Z DEBUG - 0 task(s) de-queued
2018-07-30T12:10:33Z DEBUG - [hello.1] -forced spawning
2018-07-30T12:10:33Z DEBUG - END TASK PROCESSING (took 0.000992059707642 seconds)
2018-07-30T12:10:33Z INFO - [hello.1] -(current:submitted)> started at 2018-07-30T12:10:33Z
2018-07-30T12:10:33Z DEBUG - [hello.1] -submitted => running
2018-07-30T12:10:33Z INFO - [hello.1] -health check settings: execution timeout=None
2018-07-30T12:10:34Z DEBUG - BEGIN TASK PROCESSING
2018-07-30T12:10:34Z DEBUG - 0 task(s) de-queued
2018-07-30T12:10:34Z DEBUG - END TASK PROCESSING (took 0.000984191894531 seconds)
2018-07-30T12:10:43Z DEBUG - [client-connect] root@3714d1a17999:cylc-message privilege='full-control' 524d658e-9932-4135-9523-3c2ace3990cf
2018-07-30T12:10:43Z INFO - [client-command] put_messages root@3714d1a17999:cylc-message 524d658e-9932-4135-9523-3c2ace3990cf
2018-07-30T12:10:43Z INFO - [hello.1] -(current:running)> succeeded at 2018-07-30T12:10:43Z
2018-07-30T12:10:43Z DEBUG - [hello.1] -running => succeeded
2018-07-30T12:10:43Z INFO - Suite shutting down - AUTOMATIC
2018-07-30T12:10:44Z INFO - DONE
$

If you have a utility in your computer that requires a few dependencies, but that can store the state/data in a mapped volume, the same kind of container may be useful.

Cylc Scheduler Internals - Part 2

kinow @ Jul 27, 2018 00:25:37

This is part 2, in a series of posts about Cylc internals. The part 1 had the beginning of the workflow. And here we will have the continuation, from the moment the method configure() is called.

NB: this is a post to remember things, not really expecting to give someone enough information to be able to hack the Cylc Scheduler (though you can and would have fun!).

The configure method is responsible for configuring the Suite Server Program. Which means it will interact with the configuration singletons to retrieve the necessary configuration for the program.

It also interacts with other objects that store state, and also creates basic data structures and objects required for a suite program (e.g. Queues).

This is also where the contact file is created, as well as the HTTP server.

You can download the source file for the diagram used in this post, and edit it with draw.io.

Cylc Scheduler Internals - Part 1

kinow @ Jul 14, 2018 22:42:47

This is the first post in a series of three (or maybe four later) based on diagrams I collected while debugging the Cylc scheduler. The scheduler is called by the cylc start utility.

NB: this is a post to remember things, not really expecting to give someone enough information to be able to hack the Cylc Scheduler (though you can and would have fun!).

Instead of going at length on what happens (and there is quite a bit happening when you run cylc start my.suite), I will use the following diagram, followed by a few paragraphs to highlight certain parts. The code used was based on Cylc 7.7.1.

When a Cylc command like cylc start is invoked, it actually gets translated into bin/cylc-$command_name. cylc-start is a Shell file, that will simply call cylc-run. cylc-run then imports scheduler_cli… but what you need to know is that in the end scheduler_cli will create an instance of Scheduler with the right constructor arguments, and call its start method.

After that point, you are at the left-most lifeline, on the Scheduler constructor (i.e. the init method of the scheduler.py’s Scheduler class).

If you follow the method calls - which are hopefully easy to understand and follow - you will find that the constructor merely creates a few objects, prepares the suite information, and the suite database.

Then the start method kicks things off, interacting with previously created objects, but also with some singletons for logging and configuration. Oh, that Cylc banner is also printed here (in case you would like to customize it as in SpringBoot).

After that, if you are running the suite as daemon, it will be daemonized, by forking the current process, but with Python.

One important step that happens here, is the initialization of the HTTP Server. This server will be used to communicate with the Suite Server Program. It will be listening to connections with the right endpoints available only after the configure method.

Lastly, we have configure and run methods, which are two very important methods to be discussed in the next part of this series, as they are quite extensive, and deserve their own diagrams.

You can download the source file for the diagram used in this post, and edit it with draw.io.

ImportError when debugging cylc in Eclipse

kinow @ Jul 10, 2018 00:47:13

Since I started reading cylc’s source code in Eclipse to create some sequence diagrams, I have not been able to debug it properly without hitting errors in some part of the program execution.

The error message was “ImportError: cannot import name _remove_dead_weakref”, which was a bit enigmatic as I never heard about that function, but it seemed to be something internal, or at least not from the project code base. And searching the Internet did not help much.

Here is the complete console output in Eclipse.

pydev debugger: starting (pid: 15124)
timeout 10 ps -opid,args 13640  # return 1

            ._.                                                       
            | |            The Cylc Suite Engine [7.7.1-37-g09c8a]    
._____._. ._| |_____.           Copyright (C) 2008-2018 NIWA          
| .___| | | | | .___|  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
| !___| !_! | | !___.  This program comes with ABSOLUTELY NO WARRANTY;
!_____!___. |_!_____!  see `cylc warranty`.  It is free software, you 
      .___! |           are welcome to redistribute it under certain  
      !_____!                conditions; see `cylc conditions`.       
2018-07-10T01:00:47+12 INFO - Suite starting: server=localhost:44444 pid=15124
2018-07-10T01:00:47+12 INFO - Cylc version: 7.7.1-37-g09c8a
2018-07-10T01:00:47+12 INFO - Run mode: live
2018-07-10T01:00:47+12 INFO - Initial point: 1
2018-07-10T01:00:47+12 INFO - Final point: None
2018-07-10T01:00:47+12 INFO - Cold Start 1
2018-07-10T01:00:47+12 DEBUG - [hello.1] -released to the task pool
2018-07-10T01:00:47+12 DEBUG - BEGIN TASK PROCESSING
2018-07-10T01:00:47+12 DEBUG - [hello.1] -waiting => queued
2018-07-10T01:00:47+12 DEBUG - 1 task(s) de-queued
2018-07-10T01:00:47+12 INFO - [hello.1] -submit-num=1, owner@host=localhost
2018-07-10T01:00:47+12 DEBUG - [hello.1] -queued => ready
2018-07-10T01:00:47+12 DEBUG - END TASK PROCESSING (took 0.023609161377 seconds)
2018-07-10T01:00:48+12 DEBUG - ['cylc', 'jobs-submit', '--debug', '--', '/home/kinow/Development/python/workspace/example-suite/log/job', '1/hello/01']
2018-07-10T01:00:48+12 ERROR - [jobs-submit cmd] cylc jobs-submit --debug -- /home/kinow/Development/python/workspace/example-suite/log/job 1/hello/01
    [jobs-submit ret_code] 1
    [jobs-submit err]
    Traceback (most recent call last):
      File "/home/kinow/Development/python/workspace/cylc/bin/cylc-jobs-submit", line 52, in <module>
        from cylc.batch_sys_manager import BatchSysManager
      File "/home/kinow/Development/python/workspace/cylc/lib/cylc/batch_sys_manager.py", line 114, in <module>
        from cylc.task_message import (
      File "/home/kinow/Development/python/workspace/cylc/lib/cylc/task_message.py", line 26, in <module>
        from logging import getLevelName, WARNING, ERROR, CRITICAL
      File "/home/kinow/Development/python/anaconda2/lib/python2.7/logging/__init__.py", line 26, in <module>
        import sys, os, time, cStringIO, traceback, warnings, weakref, collections
      File "/home/kinow/Development/python/anaconda2/lib/python2.7/weakref.py", line 14, in <module>
        from _weakref import (
    ImportError: cannot import name _remove_dead_weakref
2018-07-10T01:00:48+12 ERROR - [jobs-submit cmd] cylc jobs-submit --debug -- /home/kinow/Development/python/workspace/example-suite/log/job 1/hello/01
    [jobs-submit ret_code] 1
    [jobs-submit out] 2018-07-10T01:00:48+12|1/hello/01|1
2018-07-10T01:00:48+12 INFO - [hello.1] -(current:ready) submission failed at 2018-07-10T01:00:48+12
2018-07-10T01:00:48+12 ERROR - [hello.1] -submission failed
2018-07-10T01:00:48+12 DEBUG - [hello.1] -ready => submit-failed
2018-07-10T01:00:48+12 DEBUG - BEGIN TASK PROCESSING
2018-07-10T01:00:48+12 DEBUG - 0 task(s) de-queued
2018-07-10T01:00:48+12 DEBUG - END TASK PROCESSING (took 0.00175499916077 seconds)
2018-07-10T01:00:49+12 WARNING - suite stalled

As the current diagram I am working on has quite a few if‘s and else‘s, I decided to investigate why this error was occurring. Then, after some elimination I found that it was due to the missing Anaconda 2 entry in my $PATH environment variable.

I had this variable configured in a custom script I load whenever I decide to use Anaconda 2. And reproducing the same behaviour in Eclipse was easy.

A screen shot of Eclipse with source code
Locating the bug

Et voilà! Eclipse was happily debugging again!

A screen shot of Eclipse with source code
Locating the bug

So if you have a similar problem, try comparing your environment variables and check if you have some entries missing, and try adding them in Eclipse Debug configuration.

Happy cycling!