TOMLParams is a new open-source library that helps Python developers to externalize parameters in TOML files. This post will explain why storing parameters in non-code files is beneficial (including for reproducibility), why TOML was chosen, and some of the useful features of the library, which include structured sets of parameters using TOML tables, hierarchical inclusion with overriding, default values, parameter (key) checking, optional type checking, and features to help use across programs, including built-in support for setting parameters using environment variables.
The Benefits of Externalizing Parameters
Almost all software can do more than one thing, and has various parameters that are used to control exactly what it does. Some of these parameters are set once and never changed (typically configuration parameters, that each user or installation chooses), while others may be changed more often, perhaps from run to run. Command-line tools often accept some parameters on the command line itself, most obviously input and output files and core parameters such as search terms for search commands. On Unix and Linux systems, it's also common to use command line "switches" (also called flags or options) to refine behaviour. So for example, the Unix/Linux grep tool might be used in any of the following ways:
grep time # find all lines including 'time' on stdin grep time p*.txt # ... on .txt files starting with 'p' grep -i time p*.txt # ... ignoring capitalization (-i switch)
-i are examples of command-line parameters.
Many tools also use configuration files to control the behaviour of
the software. On Unix and Linux, these are typically stored in the user's
home directory, often in 'dot' files,
~/.bashrc for the Bash shell,
~/.emacs for the Emacs editor and
~/.zshrc for the Z Shell.
These files are sometimes in propriery formats, but increasly
often are in "standard" formats such as
.ini (MS-DOS initialization files),
YAML (YAML Ain't Markup Language),
or TOML (Tom's Obvious Minimal Language).
When writing code, it's often tempting just to set parameters in code.
The quickest, dirtiest practice is just to have parameter values "hard-wired" as literals in code where they are used, possibly with the same values being repeated in different places. This is generally frowned upon for a few reasons:
- Hard-wired parameters are hard to find, and therefore hard to update and inspect;
- There may be repetition, violating the don't repeat yourself (DRY) principle, and leading to possible inconsistency (though see also the counterveiling rule of three (ROT) and the write everything in threes (WET) principle.
- If you want to change parameters, you have to go to many places;
- There may be no name associated with the parameter, making it hard to know what it means;
So implementing an energy calculation using Einstein's formula E = mc², we would probably prefer either:
def energy(mass): """Return energy in joules from mass in kilograms""" return mass * constants.speed_of_light * constants.speed_of_light
def energy(m): """Return energy in joules from mass m in kilograms""" c = constants.c # the speed of light return m * c ** 2
(for a physicist) to
def energy(m): return m * 3e8 * 3e8
def energy(m): return m * 9e16
- For something like the speed of light (which is after all, a universal
constant) in code is probably appropriate, so we might have a file
constants.pycontaining something like:
# constants.py: physical constants speed_of_light = 299_792_458 # metres per second, in vacuuo. (c. 300 000 km/s) h_cross = 1.054_571_817e-34 # reduced Planck constant in joule seconds (= h/2π)
- For parameters that change in different situations, or for different users,
it's usually much better practice to store the parameters in a separate file,
ideally in a common format for which is good library support.
Some of the advantages of this include:
- Changing parameters then does not require changing code. Code may be unavilable to the user—e.g. write locked, compiled or running on a remote server through an API. Moreover, editing code often requires more expertise and confidence than editing a parameter file.
- The same code can be run multiple times using different parameters at the same time; this can be more challenging if the parameters are "baked into" the code.
- It becomes easy to maintain multiple sets of parameters for different situations, allowing a user simply to choose a named set on each occasion (usually through the parameter file's name). Of course, the code itself can have different named sets of parameters, but this is less common, and usually less transparent to the user, and harder to maintain.
There is nothing particularly special about TOML, but it is well suited as a format for parameter files, with a sane and pragmatic set of features, good library support in most modern programming languages and rapidly increasing adoption.
Here's a simple example of a TOML file:
# A TOML file (This is a comment) start_date = 2024-01-01 # This will produce a datetime.date in Python run_days = 366 # an integer (int) tolerance = 0.0001 # a floating-point vlaue (float) locale = "en_GB" # a string value title = "2024" # also a string valuue critical_event_time = 2024-07-31T03:22:22+03:00 # a datetime.datetime fermi_approximate = true # a boolean value (bool) [logging] # A TOML "table", which acts like a section, or group format = ".csv" # Yet another string events = [ # An array (list) "financial", "telecoms", ] title = "2024 events" # The same key as before, but this one is # in the logging table, so does not conflict # with the previous title
Strengths of TOML as a parameter/configuration format
- Ease of reading, writing and editing ("obviousness").
- Lack of surprises: there are few if any weirdnesses in TOML (which is less true of the other main competing formats).
- Good coverage of the main expected types: TOML supports (and
differentiates between) booleans,
integers, floating point values, strings,
dates and timestamps (with and without timezones,
and mapping to
datetime.datetimein Python), arrays (lists), key-value pairs (dictionaries), and hierarchical sections.
- It supports comments (which begin with a hash mark
- Flexible quoting with support for multiline strings, and no use of bare (unquoted) strings (avoiding ambiguity)
- Well-defined semantics without becoming fussy and awkward to edit (e.g., being unfussy about trailing commas in arrays).
- TOML is specifically designed to be a configuration file format. Quoting the front page of the website
TOML aims to be a minimal configuration file format that: is easy to read due to obvious semantics maps unambiguously to a hash table is easy to parse into data structures in a wide variety of languages"
- In the context of Python, TOML is already being adopted widely,
most notably in the increasingly ubiquitous
pyproject.tomlfiles. There are good libraries available for reading (
tomli) and writing (
tomli_w) TOML files, although these are not part of the standard library. (It appears that Python 3.11 does include
tomllib, for reading TOML files, in the standard library.)
Weaknesses of TOML as a parameter/configuration format
We don't think TOML has any major weaknesses for this purpose, but points that might count against it for some include:
- TOML does not have a null (NULL/nil/None/undefined) value.
TOMLParamscould address this, but has no current plans to do so.)
- Hierarchical sections ('tables') in TOML are not nested. So if you want
So if you want a section/table called
behavioursand subsections/subtables called
business, in TOML this might be represented by something like the excerpt below (possibly with a
[behaviours]table with its own parameters as well).
Some people, however, really don't like TOML.
[behaviours.personal] frequency = 3 [behaviours.business] frequency = 3
Why not JSON, YAML, .ini...
We chose TOML because we think, overall, it is better than each of the obvious competing formats, but the purpose of this post isn't to do down those formats, which all have their established places. But we will comment briefly on the most obvious alternatives.
JSON, while good as a transfer format and very popular in web services, is tricky for humans to write correctly, even with editor support, because it requires correct nesting and refuses to accept trailing commas in lists and dictionaries. The lack of support for dates and timestamps is also a frequent source of frustration, with quoted strings typically being used instead, with all the concomitant problems of that approach.
At first glance, YAML appears more suitable as a
configuration/parameter-file format, but is the opposite of obvious
and often produces unexpected results (in practice). Sources of frustration
include no requirement to quote strings, "magic values" like
(which maps to
no which maps to
false (much to the
annoyance of Norwegians and anyone using the NO country code),
inadvent coercison of numbers with leading zeroes to octal (in YAML 1.1),
whitespace sensitivity, and issues around multi-line strings.1
2 3 4
.ini files look a lot like TOML (I would guess they a major inspiration
for it) but are much simpler and less rich, have less well-defined syntax,
have fewer types and don't require quoting of strings, leading to ambiguity.
What are the Key Features of TOMLParams?
Simple externalization of parameters in one or more TOML files
You write your parameters in a TOML file, pass the name of the
TOML file an instance of the
and it reads the parameters from the file
and makes them available as attributes on the object, but also
makes them available using dictionary-style look-up, i.e. if you
TOMLParams instance is
p and you have a parameter
you can access it as
p['startdate'] (and, more pertinently,
k is set to
If you use tables, you can "dot" you way throught to the parameters
p.behaviours.personal.frequency) or use repeated dictionary lookups
Loading, saving, default values, parameter name checking and optional parameter type checking
The parameters that exist are defined by default value with
You can either store you defaults in a TOML file (e.g.
or pass them to the
TOMParams initializer as a dictionary.
If you choose a different TOML file, all the parameter values are first set to their default values, and then any parameters set in the file you specify override those. New parameters (i.e. any not listed in defaults) raise an error.
If you wish turn on type checking, the library will check that the all parameter values provided match those of the defaults, and you choose whether these cases result in a warning (which doesn't raise an exception) or an error (which does).
Hierarchical file inclusion with overriding
Perhaps the most powerful feature of TOMLParams is the ability for one
parameters file to 'include' one or more others. If you use TOMLParams
specifying the parameters file name as
base, and the first line of
include = 'uk'
then parameters from
uk.toml will be read and processed before
base.toml. So all parameters will first be set to
default values, then anything in
uk.toml will override those values,
and finally any values in
base.toml will override those.
The include statement can also be a list
include = ['uk', 'metric', '2023']
Inclusions are processed left to right (i.e.
uk.toml is processed,
metric.toml, followed by
2023.toml), followed by the parameters
in the including file itself. So in this case, if defaults are in
defaults.toml and the TOML file specified is
base.toml, the full
Of course, the included files can themselves include others, but the library keeps track of this and each file is only processed once (the first time it is encountered), which prevents infinite recursion and repeated setting.
This makes it very easy to maintain different kinds and groups of
parameters in different files, and to create a variation of a set
of parameters simply by making the first line
TOML file specifies the parameters you want to start from, and then
override the specific parameter or parameters you want to be different
in your new file.
Support for writing consolidated parameters as TOML after hierarchical inclusion and resolution
The library can also write the final parameters used out as a single, consolidated TOML file, which is useful when hierarchical inclusion and overriding are used, and keeps a record of the final values of all parameters. This helps with reproducibility and logging.
Support for using environment variables to select parameter set (as well as API)
You can choose how to specify the name of the parameters file to be used,
and the name two which it should default. If you create a
params = TOMLParams( defaults='defaults', name='newparams', standard_params_dir='.' )
default parameters will be read from
./newparams.toml will be processed, overriding default values.
If you want to specify the name of the parameters file to use on the command line, the usual pattern would be something like:
import sys from tomlparams import TOMLParams class Simulate: def __init__(params, ...) if __name__ == '__main__: name = sys.argv if len(sys.argv) > 1 else 'base' params = TOMLParams('defaults.toml', name) sim = Simulate(params, ...)
Sometimes, however, it's convenient to use an environment variable
to set the name of the parameters file, particularly if you want
to use the same parameters in multiple programs, or run from a
shell script or a
Makefile. You can specify an environment variable
to use for this and
TOMLParams will inspect that environment variable
if no name is passed. If you choose
SIMPARAMS for this and say
params = TOMLParams('defaults.toml', env_var='SIMPARAMS')
it will look for a name in
SIMPARAMS in the environment which you can
SIMPARAMS="foo" python pythonfile.py
export SIMPARAMS="foo" python pythonfile.py
If it's not set, it will use
base.toml as the file name, or something
else you choose with the
base_params_stem argument to
Check it out
You can install
TOMLParams from PyPI in the usual way, e.g.
python -m pip -U tomlparams
The source is available from Github (under an MIT license), at github.com/smartdatafoundry/tomlparams
README.md and documentation is available on ReAd the Docs at
You can get help within Python on the
TOMLParams class with
>>> import tomlparams >>> help(tomlparams)
After installing tomlparams, you will find you have a
which you can use to copy example code from the
$ tomlparams examples Examples copied to ./tomlparams_examples.
You can also get help from with
$ tomlparams help TOMLParams USAGE: tomlparams help — show this message tomlparams version — report version number tomlparams examples — copy the examples to ./tomlparams_examples tomlparams test — run the tomlparams tests Documentation: https://tomlparams.readthedocs.io/ Source code: https://github.com/smartdatafoundry.com/tomlparams Website: https://tomlparams.com Installation: python -m pip install -U tomlparams