Pipeline Execution Scheme
At the top level, execution is controlled by standard unix make
.
The top-level workflow to run 100 simulations using 34-fold parallelism,
reduce their output, and
produce a summarizing webpage is as simple as:
(Create the input script, Scripts/example.json)
add-sims.sh -P 34 Scripts/example.json 100
make S=example reduce
make S=example html
If you go away and forget whether the graphics are up to date because you might have added more simulations, you can just:
make S=example html
By doing this, the graphical plots and html page containing them
will be refreshed if there were new simulations added
since the last time the webpage was made, otherwise, the make
returns
immediately.
This is possible because the html
target depends on the
the reduce
target, which finally depends on the sims/example/drm
directory, where the simulation outputs are placed.
Dependency management is a powerful feature of make
.
This process works automatically whenever downstream data (webpage) needs to change in
response to new upstream data (simulations).
However, if the underlying graphics code changes ("put the titles in 14 point") you
need to force the webpage refresh by supplying -B
to make
,
which forces the rebuild:
make -B S=example html
Tower of abstraction
It's turtles (that is, driver scripts) all the way down.
Make
calls a shell-script driver to do data reductions and produce graphics.
Typically the shell-script will enforce the naming conventions
on inputs and outputs, and then call a matlab or python script to do
the actual processing.
So, there are three levels of abstraction: the make target,
the shell driver, and the 'doing' routine, in matlab or python.
The Makefile
lists all the targets at the top of the file, with an explanation.
Adding simulations to the ensemble
Adding simulations is done outside make with the add-sims.sh
shell script driver.
Its basic usage is simply:
add-sims.sh SCRIPT N
where SCRIPT
is the JSON script for Exosims,
and N
is the number of sims
to add to the ensemble tied to SCRIPT
.
This pushes down to a call to our main driver script for
Exosims, Local/ipcluster_ensemble_jpl_driver.py
.
The main function of this driver is to put result files in the proper place,
and to perform logging.
In particular, the observing sequence ("DRM") for each sim is named for
the seed and goes in one directory, and the star-planet configuration
for that sim ("SPC"), also named for the seed, goes in a separate directory.
Two options to add-sims are noteworthy:
-P PAR => run without ipython parallel, using PAR independent jobs
-S SEEDS => perform one sim for each integer seed, one per line,
in the file SEEDS. Implies -P.
The -P
option uses the same underlying run code, but
uses independent jobs (run in parallel at the shell level using xargs
)
rather than ipyparallel.
We typically use -P
because it is simpler, but ipython parallel can
be good for cases where initialization of the simulator takes significant
time.
The -S SEEDS
option allows multiple ensembles to use the same
set of seeds, so that yield variability due to parameter changes
is isolated from that due to the simulated universe.
One Exosims simulation is run per seed.
More options and further details are in the add-sims.sh
header.
Generating result summaries
iPython parallel support
As noted, simulations can also be run using iPython parallel, rather than
shell-level parallelism.
This mode starts up a given number of python processes ("engines"), which are
eventually given an Exosims function to run by add-sims.sh
.
This creates extra state (a Python interpreter is held by each "engine"),
but avoids re-initialization of the Exosims object for each run.
See also the SurveyEnsemble
documentation within Exosims.
To support creation, startup, and shutdown of these engines,
we added some iPython-parallel ("ipp") targets to the Makefile
.
These targets operate independently of the simulation/results
infrastructure targets.
The targets (invoked like make ipp-create
) are:
-
ipp-create
: create an ipython-parallel profile for this mission (use once per sandbox instance only). Copies several files into a per-user, per-machine ipyparallel directory. To undo, seeipp-nuke
, below. -
ipp-start
: start the ipython-parallel controller and engines Note: IfEXOSIMS_ENGINE_N
(an integer) is exported from the environment, this many engines will be started up, otherwise, the system-default will be used. To use this, run, from the shell:$ EXOSIMS_ENGINE_N=8 make ipp-start
-
ipp-stop
: stop the above processes. See alsoipp-kill
, below. -
ipp-status
: report status of the controller and engines Note: This attempts to run trivial jobs on the remote engines, so it will not work if the engines are busy with another job. Seeipp-ps
, below. -
ipp-ps
: use the unixps
command to identify the number of running engines. This works when the engines are busy, but is not informative about whether engines are responding to Python commands. -
ipp-kill
: sometimesipp-stop
does not work and engines or controllers are orphaned.ipp-kill
identifies these by process id, and kills them. -
ipp-nuke
: deletes your ipython-parallel profile. The inverse ofipp-create
. (Note: it attempts toipp-kill
first, so as to not leave engines running.)
As evidenced by the various status and termination commands, sometimes
using ipython parallel in this context can be annoying, because you have
to remember the state of the worker engines.
In particular, the engines will have to be restarted (ipp-stop
followed by ipp-start
)
when the underlying Exosims code changes, because the already-running
engines will hold a stale copy of the code.
Webserver support
Results are produced as graphic files and as a webpage that summarizes the
graphics. You could view the html in several ways, but one easy way is to
start a server within the simulations directory. A simple Python server
(python -m http.server 8000
) is too slow and does not support some
video functions, so we use httpd (Apache).
To control this server, the Makefile
has these targets:
-
html-start
: start httpd server -
html-stop
: stop httpd server -
html-status
: show running servers, if any, by inspecting their log files.
Using make html-status
tells you where to point your browser.
The server typically can stay up for months without trouble.
Tuning experiment support
We had occasional need to run multiple linked ensembles to understand performance of schedulers in response to different tuning parameters. For instance, a scheduler might need to trade off slew time against Brown completeness ("should we spend time slewing to a far-off high-completeness target, or integrate longer on a nearby target with smaller completeness"). So we seek to compute and maximize yield across a selected set of tuning parameters.
This was handled by creating a series of Exosims input scripts, one for
each parameter setting, running
add-sims.sh
for each, and performing data reduction as noted above.
This amounts to an outer loop around the above process.
Regarding execution, we would generate a list of the input script
filenames, and farm this list out to the machines available (aftac1,
aftac2, aftac3). The detailed process is described in the file
Experiments/run-experiment-howto.txt
.