Pipeline Execution Scheme

At the top level, execution is controlled by standard unix make. The top-level workflow to run 100 simulations using 34-fold parallelism, reduce their output, and produce a summarizing webpage is as simple as:

(Create the input script, Scripts/example.json)
add-sims.sh -P 34 Scripts/example.json 100
make S=example reduce
make S=example html

If you go away and forget whether the graphics are up to date because you might have added more simulations, you can just:

make S=example html

By doing this, the graphical plots and html page containing them will be refreshed if there were new simulations added since the last time the webpage was made, otherwise, the make returns immediately. This is possible because the html target depends on the the reduce target, which finally depends on the sims/example/drm directory, where the simulation outputs are placed. Dependency management is a powerful feature of make.

This process works automatically whenever downstream data (webpage) needs to change in response to new upstream data (simulations). However, if the underlying graphics code changes ("put the titles in 14 point") you need to force the webpage refresh by supplying -B to make, which forces the rebuild:

make -B S=example html

Tower of abstraction

It's turtles (that is, driver scripts) all the way down. Make calls a shell-script driver to do data reductions and produce graphics. Typically the shell-script will enforce the naming conventions on inputs and outputs, and then call a matlab or python script to do the actual processing. So, there are three levels of abstraction: the make target, the shell driver, and the 'doing' routine, in matlab or python.

The Makefile lists all the targets at the top of the file, with an explanation.

Adding simulations to the ensemble

Adding simulations is done outside make with the add-sims.sh shell script driver. Its basic usage is simply:

add-sims.sh SCRIPT N

where SCRIPT is the JSON script for Exosims, and N is the number of sims to add to the ensemble tied to SCRIPT. This pushes down to a call to our main driver script for Exosims, Local/ipcluster_ensemble_jpl_driver.py. The main function of this driver is to put result files in the proper place, and to perform logging. In particular, the observing sequence ("DRM") for each sim is named for the seed and goes in one directory, and the star-planet configuration for that sim ("SPC"), also named for the seed, goes in a separate directory.

Two options to add-sims are noteworthy:

-P PAR    => run without ipython parallel, using PAR independent jobs
-S SEEDS  => perform one sim for each integer seed, one per line,
           in the file SEEDS.  Implies -P.

The -P option uses the same underlying run code, but uses independent jobs (run in parallel at the shell level using xargs) rather than ipyparallel. We typically use -P because it is simpler, but ipython parallel can be good for cases where initialization of the simulator takes significant time.

The -S SEEDS option allows multiple ensembles to use the same set of seeds, so that yield variability due to parameter changes is isolated from that due to the simulated universe. One Exosims simulation is run per seed.

More options and further details are in the add-sims.sh header.

Generating result summaries

iPython parallel support

As noted, simulations can also be run using iPython parallel, rather than shell-level parallelism. This mode starts up a given number of python processes ("engines"), which are eventually given an Exosims function to run by add-sims.sh. This creates extra state (a Python interpreter is held by each "engine"), but avoids re-initialization of the Exosims object for each run. See also the SurveyEnsemble documentation within Exosims.

To support creation, startup, and shutdown of these engines, we added some iPython-parallel ("ipp") targets to the Makefile. These targets operate independently of the simulation/results infrastructure targets. The targets (invoked like make ipp-create) are:

  • ipp-create: create an ipython-parallel profile for this mission (use once per sandbox instance only). Copies several files into a per-user, per-machine ipyparallel directory. To undo, see ipp-nuke, below.

  • ipp-start: start the ipython-parallel controller and engines Note: If EXOSIMS_ENGINE_N (an integer) is exported from the environment, this many engines will be started up, otherwise, the system-default will be used. To use this, run, from the shell:

    $ EXOSIMS_ENGINE_N=8 make ipp-start

  • ipp-stop: stop the above processes. See also ipp-kill, below.

  • ipp-status: report status of the controller and engines Note: This attempts to run trivial jobs on the remote engines, so it will not work if the engines are busy with another job. See ipp-ps, below.

  • ipp-ps: use the unix ps command to identify the number of running engines. This works when the engines are busy, but is not informative about whether engines are responding to Python commands.

  • ipp-kill: sometimes ipp-stop does not work and engines or controllers are orphaned. ipp-kill identifies these by process id, and kills them.

  • ipp-nuke: deletes your ipython-parallel profile. The inverse of ipp-create. (Note: it attempts to ipp-kill first, so as to not leave engines running.)

As evidenced by the various status and termination commands, sometimes using ipython parallel in this context can be annoying, because you have to remember the state of the worker engines. In particular, the engines will have to be restarted (ipp-stop followed by ipp-start) when the underlying Exosims code changes, because the already-running engines will hold a stale copy of the code.

Webserver support

Results are produced as graphic files and as a webpage that summarizes the graphics. You could view the html in several ways, but one easy way is to start a server within the simulations directory. A simple Python server (python -m http.server 8000) is too slow and does not support some video functions, so we use httpd (Apache).

To control this server, the Makefile has these targets:

  • html-start: start httpd server

  • html-stop: stop httpd server

  • html-status: show running servers, if any, by inspecting their log files.

Using make html-status tells you where to point your browser. The server typically can stay up for months without trouble.

Tuning experiment support

We had occasional need to run multiple linked ensembles to understand performance of schedulers in response to different tuning parameters. For instance, a scheduler might need to trade off slew time against Brown completeness ("should we spend time slewing to a far-off high-completeness target, or integrate longer on a nearby target with smaller completeness"). So we seek to compute and maximize yield across a selected set of tuning parameters.

This was handled by creating a series of Exosims input scripts, one for each parameter setting, running add-sims.sh for each, and performing data reduction as noted above. This amounts to an outer loop around the above process.

Regarding execution, we would generate a list of the input script filenames, and farm this list out to the machines available (aftac1, aftac2, aftac3). The detailed process is described in the file Experiments/run-experiment-howto.txt.