Skip to main content
ODE Simulation and Inference

ODE Simulation and Inference

ODE Simulation and Inference

The purpose of the functions in this module are to:
  1. Take a set of parameters and times as input, and predict the number of each of the three proteins there will be at every timepoint (tested in test_repressilator_simulation_class()).
  2. Take protein number traces as input, and determine by optimisation the governing parameters that generated them.

Getting started

Navigate to Repressilator_tests/tests and verify that the test fails:
python -m pytest ode_inference_test.py
The inference code can take quite a long time to run, so you may just want to start by testing test_repressilator_simulation_class(). It's worth making a branch if you're changing the codebase:
git checkout -b ode_fix
Now we can make changes to the tests and repressilator_analysis module, and can choose to merge them back later. It's often worth being able to print test outputs, so you may like to add test_repressilator_simulation_class() to start with. You can then see the same test output by running:
python ode_simulation_and_inference.py
The repressilator paper can be found in the docs/ folder. You should read box 1 on page 3, which talks about the ordinary differential equation model. As you read, note the form of the differential equations for mRNA and protein and which parameters are defined (α, α₀, β, n). If you have not encoutered it before, you may be interested in the Hill equation which is used in the ODE model to describe binding kinetics.

The Test Functions

test_repressilator_simulation_class()

Tests a function simulate() that is a member of a class ode_inference.RepressilatorModel. Even though the model simulates the number of all three proteins in the repressilator circuit, only two protein number values are assessed.
  • Arguments:
    • A list of parameters, provided in the order defined by RepressilatorModel._param_names
    • The times of the experiment in seconds
  • Returns: A numpy.ndarray of predicted protein numbers, with column 0 being the nuclear numbers and column 1 the cytosolic.

test_infer_parameters()

Tests a single function: ode_inference.infer_parameters()
  • Arguments:
    • A list of times in seconds (one-dimensional numpy.ndarray)
    • A two-dimensional numpy.ndarray of shape (n_times, 2) in the same format as returned by ode_inference.RepressilatorModel.simulate()
  • Returns: A list of parameters in the same order as RepressilatorModel._param_names

test_repressilator_simulation_class()

AssertionError assert RMSE(values[:,0], test_nuclear_protein)<36

This indicates that the root mean squared error difference between the predicted and actual recorded protein numbers is greater than 36.

What is the code doing before the error?

First let's modify the test code to plot the protein numbers from simulate() vs. the protein numbers stored as test data
Look at the actual simulate() function and understand what it does.
Compare the mRNA and protein equations from the code with the equations in box 1. Are they the same equation? Which terms differ?
Do the parameters used in the model (for example, those used in _param_names) agree with the description of box 1 in the paper?

How can we resolve the error?

Let's change the simulation code to match the equations from the paper
If you inspect the parameters in the params_003.json file
# after `params` is loaded for key in params.keys(): print("{0}:{1}".format(key, params[key]))
you will notice that there are more parameter values than included in the _param_names list. Update the code to incorporate these values, according to the description of their usage in box 1 in the paper. initial_ values refer to the initial values of m1-3 and p1-3 (where n_x_1 is x1, n_x_2 is x_2 and c_x is x3, where x is either m or p), and T_e refers to the translation efficiency of an mRNA molecule. You can assume that all the transcription factors act as monomers (which is relevant in the context of the K_m parameter). Furthermore, the differential equation model is only valid after using the non-dimensionalisation procedure outlined in the Elowitz paper. Non-dimensionalisation rescales the variables so that the equations have no physical units; it simplifies the maths and means the parameters (such as α) end up as dimnesionless values rather than quantities with awkward units. The paper describes how to do this; the key effect on the code is that time must be rescaled and α must be converted before being passed to the solver.
The rapid oscillation is because the time units still haven't been non-dimensionalised
In the previous solution block we're out by a constant factor; the differential equation model expresses proteins in units of K_m, but the test data is just raw protein count

test_infer_parameters()

KeyError or AssertionError assert RMSE(sim[:,0], true_array[:,1])<36

If you've been following along with the previous solution block, you'll get a KeyError, as the _param_names variable in RepressilatorModel is now incompatible - if you run the LLM code as-is, you'll get an assertion error. In the following solution blocks, I'm going to totally rewrite the code to make use of the PINTS (parameter inference for noisy timeseries) library and the CMA-ES algorithm, which is a robust gradient-free optimiser, because that's what I'm used to (and it can handle multiple columns). For any optimisation function, you will need to
  1. Specify parameters, and the order with which they are fed into simulate
  2. Specify parameter bounds (i.e. the parameter space in which the optimiser will search)
  3. Specify the initial search point.
  4. Specify optimiser options
  5. Find some way of dealing with multiple output columns (as simulate returns the cytosolic and nuclear protein numbers)
I also usually include the following
  1. Random initialisation
  2. Multiple independent optimisation runs

Rewriting the inference code

The arguments to the function are as in the LLM implementation (although without optimiser choice)
def infer_parameters( times: np.ndarray, observations: np.ndarray, ):
and we setup the model in the same way
model = RepressilatorModel(times)
The next step is to define boundaries for each parameter. The actual simulation values are centred around the values given in the paper in the docs folder
We then need to set up the optimiser

Wrapping up

Once you're satisfied that all tests pass:
python -m pytest ode_inference_test.py
Remove any diagnostic code you added to ode_inference_test.py, then merge your changes back into master:
git checkout master git merge ode_fix