G - Testing a Time-Dependent Model
Here, we set up a time-dependent model from its source code for an experiment.
TL; DR
In a terminal, navigate to floatcsep/tutorials/case_g
and type:
$ floatcsep run config.yml
After the calculation is complete, the results will be summarized in results/report.md
.
Experiment Components
The example folder contains also, along with the already known components (configurations, catalog), a sub-folder for the source code of the model pymock
. The components of the experiment (and model) are:
case_g
└── pymock (Model's source code)
├── input (input interface to floatcsep)
├── args.txt (model arguments)
└── catalog.csv (dynamically allocated catalog)
├── pymock
├── libs.py (helper functions)
└── main.py (main routines)
└── forecasts (output interface to floatcsep)
... (forecasts should be stored here when the model is run)
├── run.py (One of the possibilities to run the model)
├── pyproject.toml (Build instructions)
├── setup.cfg (Build instructions)
├── setup.py (Build instructions)
├── requirements.txt(Build instructions)
├── Dockerfile (Build instructions)
└── README.md (Information)
├── catalog.csv
├── config.yml
├── models.yml
├── custom_plot_script.py
└── tests.yml
The model to be evaluated (
pymock
) is a source code that generates forecasts for multiple time windows.The testing catalog
catalog.csv
works also as the input catalog, by being filtered until the testingstart_date
and allocated in pymock/input dynamically (before each time the model is run)
Model
The experiment’s complexity increases from time-independent to dependent mostly because we now need a Model (source code) to generate forecasts that changes for every time-window. The model main components are:
Input: The input consists in input data and arguments.
The input data is, at the very least, a catalog filtered until the forecast beginning. The catalog will be automatically allocated by
floatcsep
prior to each model’s run (e.g., a single forecast run) in the {model}/input folder. It is stored in thecsep.ascii
format for simplicity’s sake (see Catalogs).
lon,lat,mag,time_string,depth,catalog_id,event_id 13.292,43.075,2.1,2005-04-17T05:06:52.380000,21.7,-1,1592369
The input arguments controls how the model’s source code works. The minimum arguments to run a model are the forecast
start_date
andend_date
, which will be modified dynamically during an experiment with multiple time-windows. The experiment system will access {model}/input/args.txt and change the values ofstart_date = {datetime}
andend_date = {datetime}
before the model is run. Additional arguments can be set by convenience, such as (not limited to)catalog
(the input catalog name),n_sims
(number of synthetic catalogs) and randomseed
for reproducibility.
Output: The model’s output are the synthetic catalogs, which should be allocated in {model}/forecasts/{filename}.csv by the source code after each rone. The format is identically to
csep_ascii
, but unlike in an input catalog, thecatalog_id
column should be modified for each synthetic catalog starting from 0. The file name follows the convention {model_name}_{start}_{end}.csv, wherestart
andend
folows the %Y-%m-%dT%H:%M:%S.%f - ISO861 FORMATModel build: Inside the model source code, there are multiple options to build it. A standard python
setup.cfg
is given, which can be built inside a pythonvenv
orconda
managers. This is created and built automatically byfloatCSEP
, as long as the the model build instructions are correctly set up.Model run: The model should be run with a simple command, e.g. entrypoint, to which only
arguments
could be passed if desired. Thepymock
model contains multiple example of entrypoints, but the modeler should use only one for clarity.A python call with arguments
$ python run.py input/args.txt
Using a binary entrypoint with arguments (for instance, defined in the python build instructions:
pymock/setup.cfg:entry_point
)
$ pymock input/args.txt
A single binary entrypoint without arguments .
$ pymock
This means that the source code should internally read the input data and arguments,
input/catalog.csv
andinput/args.txt
files respectively.
Important
The model should be conceptualized as a black-box, whose only interface/interaction with the floatcsep
system is to receive an input (i.e., input catalog and arguments) and generates an output (the forecasts).
Configuration
Time
The configuration is identical to time-independent models, with the exception that now a
horizon
can be defined instead ofintervals
, which is the forecast time-window length. The experiment’s class should now be explicited asexp_class: td
time_config: start_date: 2012-5-23T00:00:00 end_date: 2012-6-23T00:00:00 horizon: 7days exp_class: td
Catalog
The catalog was obtained previous to the experiment using
query_bsi
, but it was filtered from 2006 onwards, so it has enough data for the model calibration.
Models
Additional arguments should be passed to time-independent models.
- pymock: path: pymock func: pymock func_kwargs: n_sims: 100 mag_min: 3.5 build: venv
Now
path
points to the folder where the source is installed. Therefore, the input and the forecasts should be allocated{path}/input
and{path}/forecasts
, respectively.The
func
option is the shell command with which the model is run. As seen in the Model section, this could be eitherpymock
,pymock input/args.txt
orpython run.py input/args
. We use the simplest optionpymock
, but you are welcome to try different entrypoints.Note
The
func
command will be run from the model’s directory and a model containerization (e.g.,Dockerfile
,conda
).
The
func_kwargs
are extra arguments that will annotated to theinput/args.txt
file every time the model is run, or will be passed as extra arguments to thefunc
call (Note that the two options are identical). This is useful to define sub-classes of models (or flavours) that uses the same source code, but a different instantiation.The
build
option defines the style of container within which the model will be placed. Currently in floatCSEP, only the python modulevenv
, the package managerconda
and the containerization managerDocker
are currently supported.Important
For these tutorials, we use
venv
sub-environments, but we recommendDocker
to set up real experiments.
Tests
With time-dependent models, now catalog evaluations found in
csep.core.catalog_evaluations
can be used.- Catalog_N-test: func: catalog_evaluations.number_test plot_func: - plot_number_test: plot_args: title: Test distribution - plot_consistency_test: plot_kwargs: one_sided_lower: TrueNote
It is possible to assign two plotting functions to a test, whose
plot_args
andplot_kwargs
can be placed indented beneath
Custom Post-Process
Additional to the default
plot_results()
,plot_catalogs()
,plot_forecasts()
functions, a custom plotting function(s) can be set within thepostprocess
configurationpostprocess: plot_custom: custom_plot_script.py:mainThis option provides hook for a python script and a function within as:
{python_sript}:{function_name}
The requirements are that the script to be located within the same directory as the configuration file, whereas the function must receive a
floatcsep.experiment.Experiment
as argumentdef main(experiment): """ Example custom plot function (Observed vs. forecast rates in time) Args: experiment: a floatcsep.experiment.Experiment class """In this way, the plot function can use all the
Experiment
attributes/methods to access catalogs, forecasts and test results. The scripttutorials/case_g/custom_plot_script.py
can also be viewed directly on GitHub, where it is exemplified how to access the experiment data in runtime.
Running the experiment
The experiment can be run by simply navigating to the
tutorials/case_g
folder in the terminal and typing.$ floatcsep run config.ymlThis will automatically set all the calculation paths (testing catalogs, evaluation results, figures) and will create a summarized report in
results/report.md
.