Running Cylc workflows on BSC MareNostrum5

TL;DR: Using the communication method = poll of Cylc 8 you can easily run Cylc 8 workflows on BSC MareNostrum5 HPC.

A couple of years ago I gave a talk about “HPC workflows for climate models” at an ESiWACE3 event. There, I tried to explain in an unbiased way the differences among the workflow managers commonly used for climate and weather, including Autosubmit, ecFlow, and Cylc. I worked with the three workflow managers, and developed Cylc while at NIWA in New Zealand, and currently I develop Autosubmit at the Barcelona Supercomputing Center, BSC, in Barcelona Spain.

Venn diagram from 'HPC workflows for climate models' talk
Venn diagram from 'HPC workflows for climate models' talk

The BSC MareNostrum5 HPC has a peculiarity where its computing nodes – where Slurm run the heavy, resource-demanding jobs – are not allowed to communicate back to the HPC login nodes. I have been working at the BSC for three years and so far I never heard about any exceptions made.

This means workflow managers where there is any communication from tasks running on computing nodes back to where the workflow scheduler runs time out, causing workflow runtime errors. This is a problem as ecFlow depends on the tasks telling the server that they have finished (running the bash tailer of the task), and in Cylc this is also a problem as by default Cylc uses a communication mode that requires the connectivity among the worker nodes and the scheduler.

At the BSC, some time ago, I saw another project with ecFlow that ran at the MareNostrum5 HPC without the tailer, and used some custom code in order to tell ecFlow that the task had been finished. While this worked, I asked someone from ECMWF – they maintain ecFlow – and they told me it was discouraged to manage tasks that way. The project at the BSC has now been refactored and is using Autosubmit.

Things are simpler with Cylc. As I explained in my talk, and also offline to others, Cylc can in fact be used in an environment with networking constraints like the BSC MareNostrum5. The documentation of Cylc explains how to use their poll communication method. It is the “Polling mode” that appears right at the center of the venn diagram at the top of this blog post.

Domingo Manubens-Gil left the BSC before I joined, but I can see from Git and from previous deliverables for European projects that he did a lot of great work for Autosubmit. Some of his latest works include technical reports on the possibility to run Cylc workflows on MareNostrum5. Which also confirms that Cylc workflows can run on BSC MareNostrum5 HPC.

But as a good engineer, I doubt most things I read until I can actually try it or have more solid proof. And Domingo’s work was published before Cylc 8 had been released. So to make sure Cylc 8 workflows run on BSC MareNostrum5, I tried the Cylc documentation “broadcast” tutorial, which you can find here.

Prerequisites:

First, my ~/.cylc/flow/8/global.cylc:

 1[platforms]
 2    [[mn5]]
 3        cylc path = /gpfs/scratch/bsc32/<ADD-A-VALID-BSC-USER>/cylc/venv/bin/
 4        hosts = mn
 5        install target = mn
 6        job runner = slurm
 7        retrieve job logs = True
 8
 9        communication method = poll
10        submission polling intervals = 10*PT1M, 10*PT5M
11        execution polling intervals = 10*PT1M, 10*PT5M

And here’s the diff for the Cylc broadcast tutorial workflow:

 1diff --git a/flow.cylc b/flow.cylc
 2index d2d1ede..52702f8 100644
 3--- a/flow.cylc
 4+++ b/flow.cylc
 5@@ -1,3 +1,8 @@
 6+#!Jinja2
 7+
 8+{% set HPC_PROJECT  =   "bsc32" %}
 9+{% set HPC_USER     =   "<ADD-A-VALID-BSC-USER>" %}
10+
11 [scheduling]
12     initial cycle point = 1012
13     [[graph]]
14@@ -5,11 +10,27 @@
15         PT1H = announce[-PT1H] => announce
16 
17 [runtime]
18+    [[MN5]]
19+        platform = mn5
20+        # Wallclock
21+        execution time limit = PT05M
22+        # NOTE: do not set walltime here or Cylc may keep one value
23+        #       while you have another one in the HPC! See the config
24+        #       flow.cylc[platforms][[mn5]][[[execution time limit]]]
25+        [[[directives]]]
26+            --account={{ HPC_PROJECT }}
27+            --qos=gp_debug
28+            --partition=standard
29+            --ntasks=1
30+            --cpus-per-task=1
31+
32     [[wipe_log]]
33+        inherit = MN5
34         # Delete any files in the workflow's "share" directory.
35         script = rm "${CYLC_WORKFLOW_SHARE_DIR}/knights" || true
36 
37     [[announce]]
38+        inherit = MN5
39         script = echo "${CYLC_TASK_CYCLE_POINT} - ${MESSAGE}" >> "${FILE}"
40         [[[environment]]]
41             WORD = ni

Once you have configured your platform and workflow, you can cd into the workflow source directory and run cylc vip --no-detach. Then observe the logs locally and – optionally – in Slurm. You should see a few jobs being launched by Cylc. The poll interval is used to control how often Cylc poll the current jobs and verifies their remote statuses (something that you cannot do with Autosubmit as these values are hard-coded right now).

You can issue the cylc broadcast commands from the Cylc documentation tutorial to change the message printed in the log, and play with other commands such as cylc cat-log, cylc poll, etc. They should work fine on MN5.

1$ tail -f knights 
210120102T0400Z - We are the knights who say "ni"!
310120102T0500Z - We are the knights who say "ni"!
410120102T0600Z - We are the knights who say "ni"!
510120102T0700Z - We are the knights who say "it"!
610120102T0800Z - We are the knights who say "it"!
710120102T0900Z - We are the knights who say "it"!

Regarding communication with the HPC platform, the only cons comparing with Autosubmit are that platform configuring is not fully centralized, and Cylc does not have an equivalent of Autosubmit’s wrappers (so if you use Cylc at the BSC MareNostrum5, you will probably face long queueing times). On the other hand, you gain advanced cycling, and fine-grained control of several settings (like log retrieval, another thing you cannot turn off in Autosubmit).

$ sacct --starttime 2025-05-25
JobID           JobName  Partition    Account  AllocCPUS      State ExitCode 
------------ ---------- ---------- ---------- ---------- ---------- -------- 
21445023     wipe_log.+        gpp      bsc32          2  COMPLETED      0:0 
21445023.ba+      batch                 bsc32          2  COMPLETED      0:0 
21445023.ex+     extern                 bsc32          2  COMPLETED      0:0 
21445050     announce.+        gpp      bsc32          2  COMPLETED      0:0 
21445050.ba+      batch                 bsc32          2  COMPLETED      0:0 
21445050.ex+     extern                 bsc32          2  COMPLETED      0:0 
21445069     announce.+        gpp      bsc32          2  COMPLETED      0:0 
21445069.ba+      batch                 bsc32          2  COMPLETED      0:0 
21445069.ex+     extern                 bsc32          2  COMPLETED      0:0 
21445083     announce.+        gpp      bsc32          2  COMPLETED      0:0 

(p.s.: remember to stop your workflow!)

Categories: Blog

Tags: Opensource, Cylc, Workflows, Python, Programming