EXOSIMS Sandbox
We developed an environment for running batches of Exosims simulations. There are four main functional pieces to this sandbox: simulation execution, data reduction, graphics generation, and webpage generation. Also, there is some ancillary support for ensembles of related simulations for parameter-tuning, iPython parallel engines, and a webserver for inspection of results. In general, the workflow is to create an Exosims script, run an ensemble of simulations that dump outputs to disk, reduce data from those simulations to a set of CSV files, and generate plots.
Most of the workflow execution is controlled by a Makefile
, and
thus invoked by running make
from the shell with an appropriate target.
The components making up this tower of abstraction are described elsewhere.
The make
mechanism both feeds off of and imposes a file structure,
which we describe next.
Sandbox File Layout
We run simulations tailored to a couple of mission scenarios. We show HabEx here for example, but the Luvoir simulations have the same layout.
HabEx/
README
EXOSIMS/ -- symlink to allow "import EXOSIMS"
Makefile -- controls execution
Scripts/ -- JSON scripts for Exosims
.../HabEx_4m.json -- typical input script
sims/ -- Ensembles of DRM outputs
.../HabEx_4m/... -- DRMs and data from above script
add-sims.sh -- adds more Exosims runs to sims/
Experiments/ -- groups of ensembles
.../HabEx.json -- Lists selected scripts above
ipyparallel/ -- ipython parallel configuration, per-user
util/ -- mission-generic utility code
Local/ -- mission-specific code
The main conventions here are:
- The Exosims code applicable to the mission is symlinked so that
any module here can
import EXOSIMS
successfully. - The Exosims input scripts are all under the
Scripts
directory. All runs using that script (an ensemble) have output in the corresponding directory underneathsims
. This convention allows all utilities to place reduced data, graphics, and html index pages into standard locations. - For things like parameter sweeps, several related ensembles will
be generated for a variety of parameters. We call this an experiment.
The Exosims input scripts pertaining to an experiment
are generated from templates that are filed
under
Experiments
. Their corresponding output files (graphics) are also in directories undersims
, but nested one level down. That is, scripts for an experiment named "Tune" will be in filesScripts/Tune.exp/PARAM.json
, and the simulation results will be in files likesims/Tune.exp/PARAM/...
. TheMakefile
has targets pertaining to such experiments. - Most reduction and graphics code lives in
util
andLocal
; the latter is intended to be mission-tailored, but the tailoring has turned out to be minimal.
Simulation Output Files
The subdirectories of sims
are of particular interest, because they store
Exosims outputs like DRMs, reduced data like CSV files, and graphical
outputs like plots and movies:
drm/ -- DRM outputs (time-ordered observation lists)
.../seed1.pkl -- a specific run as a python pickle
... -- (and a bunch more runs)
spc/ -- SPC (star-planet configuration) files
.../seed1.spc -- a specific SimulatedUniverse as a pickle
...
run/ -- run logs
.../outspec.json -- outspec from given runs
reduce-info.csv -- CSV files summarizing the ensemble
reduce-radlum.csv
reduce-times.csv
reduce-visits.csv, etc.
reduce-pool.json
gfx/ -- various plots as images
path/ -- single-DRM movies
.../seed1.mp4
path-ens/ --
Each Exosims run produces a single DRM, stored in drm/
according to the
random-number seed, and a corresponding star-planet configuration
(the SimulatedUniverse
), stored in spc/
.
Depending on how it was run, it may produce a text runlog in log/
, and
a JSON outspec in run/
.
This set of outputs (DRM, SPC, runlog, errorlog, outspec) is made by
a tailored SurveyEnsemble
class
(Local/EXOSIMS_local/IPClusterEnsembleJPL2.py
),
and a shell-level driver and "run_one" (Local/ipcluster_ensemble_jpl_driver.py
).
These have been generalized and updated from the Exosims standard files,
which are documented in Exosims as subclasses following the
SurveyEnsemble
prototype.
As mentioned, after an ensemble is run, a reduction script is typically invoked that generates the CSV output files, and subsequent graphics commands use the ensemble summaries, or in some cases just single DRMs.
Data Reduction
We separated data reduction from plotting so that reduction of
the ensemble could be done once, and then various plots could
be remade and tweaked quickly from the reduced data.
The python driver is reduce_drms.py
.
It contains some re-usable code for loading the ensemble of DRMs
and for computing summaries (means, standard errors,
medians, quantiles, histograms).
With only one exception, reduce_drms
produces averages over
the ensemble. Note that this does not limit us to mean values!
For example, a histogram of yields is also an average over the ensemble:
bin number "N" of a yield histogram
contains the average, across the ensemble,
of this 0/1 variable: "N" exo-earths were detected in the simulation.
Data reduction happens at the ensemble level, by reading in the set of typically ~100 DRMs and corresponding SPCs, and putting out CSV files such as the following non-exclusive list:
reduce-info.csv -- metadata
reduce-times.csv -- temporal summaries like fuel use
reduce-radlum.csv -- histograms segmented by radius/luminosity
reduce-visits.csv -- histograms of revisits
reduce-earth.csv -- exo-Earth detection counts
Currently, there are 14 such files.
The CSV format is good at storing vectors, so the above CSV files are all vectorized along some index set: time, radius/luminosity, or revisit-count. This format enforces a certain discipline in how a summary should be done: within a bin defined by the index set ("month 15 of the simulation", "radius in range X, luminosity in range Y"), counts that fall in each bin are accumulated across DRMs. Then, means, standard errors, and quantiles can be computed from the counts. So, all CSVs have two columns indicating the lower and upper bin boundaries, and for each quantity of interest, there is one column each for the mean, standard error, each quantile needed, and the number of ensemble members comprising that mean.
As an example, the fields now stored in reduce_times.csv
include:
h_det_time_lo, -- bin boundaries
h_det_time_hi,
h_det_time_all_mean, -- cumulative detections
h_det_time_all_std,
h_det_time_all_nEns,
h_det_time_unq_mean, -- unique detections only
h_det_time_unq_std,
h_det_time_unq_nEns,
h_det_time_rev_mean, -- revisits only
h_det_time_rev_std,
h_det_time_rev_nEns,
h_time_fuel_all_mean, -- fuel use
h_time_fuel_all_std,
h_time_fuel_all_nEns,
h_time_fuel_slew_mean, -- fuel used for slews
h_time_fuel_slew_std,
h_time_fuel_slew_nEns,
h_time_fuel_keep_mean, -- fuel for station-keeping
h_time_fuel_keep_std,
h_time_fuel_keep_nEns
There are 89 such fields in that file, and about 1100 summarized quantities all together.
The one exception to this is the file reduce-earth-char-list.csv
, which
simply contains a list of all attempted characterizations across the entire ensemble.
Graphical Output
Graphical outputs are generated by Matlab and by Python. In some ways, Python is preferable, being un-encumbered by licensing and more powerful. However, we are comfortable with Matlab plots, and the reduced data in CSV is easy to load into Matlab, so we adopted a combined approach.
Graphical Output Files: Ensemble From Matlab
To make the whole-ensemble
plots from the CSV files above,
you run make S=(SCRIPT) graphics
.
This runs a driver (in the shell) that invokes Matlab,
which reads the CSVs and invokes plotting m-files
(all in Local/Matlab/mfile
) for each
plot flavor, including for example:
plot_drms_script.m -- driver script
plot_drm_det_times.m -- detections-vs-time
plot_drm_fuel_use.m -- fuel-vs-time
plot_drm_radlum.m -- radius/luminosity
plot_drm_signal_end.m -- signals success by writing a file
The resulting output files are put into files like the following in
sims/<script>/gfx/
:
gfx/det-detects.png
gfx/det-cume-detects.png
gfx/det-fuel.png
gfx/det-radlum-det.png
gfx/det-radlum-char.png
gfx/det-radlum-det-all.png
Currently, 90 files are generated, mostly PNG image files, but also some PDF files.
See the binned ensemble plot gallery.
Graphical Output Files: Ensemble From Python
Other plots show the tour of characterizations
made by starshade missions.
These are made by the python script
util/ens-path-graphics.py
which is invoked by util/ens-path-summary.sh
.
The make
target is path-ensemble
.
The resulting output files are put into files in
sims/<script>/path-ens
:
path-map.png -- map-format plot of activity
path-adjacency-lon.png -- slews, targets ordered by lon
path-adjacency-lat.png -- slews, targets ordered by lat
path-visits.csv -- mean number of visits per star
path-slews.csv -- total slews between pairs of stars
See the ensemble path plot gallery.
The latter two CSV files are used by a javascript widget in the generated HTML page to show a zoomable plot of slews across the ensemble.
Graphical Output Files: Single DRM
A final set of plots shows the observations, keepout, or slews for a single
DRM.
They are useful in analyzing observation-scheduling behavior,
verifying that keepout constraints are honored, and related questions.
They are made by plot-keepout-and-obs.py
, plot-timeline.py
,
and keepout_path_graphics.py
.
For a given DRM with a certain (integer) SEED,
these files are placed in the directory
sims/<script>/path/
, and named like:
SEED-obs-timeline.png -- time-format plot of all observations
SEED-obs-keepout-all.png -- time-format plot of keepout (all obs.)
SEED-obs-keepout-char.png -- time-format plot of keepout (char only)
SEED-obs-timeline-info.csv
SEED.mp4 -- movie of all detection and char observations
SEED-final.png -- final frame of above movie
Some other plots and summaries are placed in the directory
sims/<script>/path/SEED-cume
, including:
path-visits.csv
path-slews.csv
which are used by the same javascript widget described above, to show a zoomable plot of slews for just that DRM.
See the single-drm gallery.