Stochastic model emulation

Overview

A requirement of effective Bayesian_model_calibration algorithms is that forward simulation from the model (the "simulator") is fast. This is because many thousands or millions of forward simulations (from the model) may be required as part of the calibration scheme. Unfortunately, this speed requirement may not be met by many models of interest for complex biological processes. If forward simulation from the simulator is not fast enough then one option is to replace the simulator by an approximate model that is fast to simulate from. Another approach (which is popular in the analysis of deterministic computer models) is to emulate the simulator. An emulator is a statistical/stochastic model which is an accurate approximation to the simulator, fast to forward simulate and, potentially, analytically tractable. The benefit of using an emulator is that it generally leads to a much faster calibration of the biological model (using Bayesian_model_calibration techniques) than can be achieved when using the slower simulator. However, for this approach to work, the emulator clearly needs to be an accurate surrogate for the (exact) simulator.

Fitting an emulator

In essence, an emulatior is obtained by simply fitting an appropriate statistical model to output from the simulator. This output is generated by running the simulator at a range of values for its inputs. These inputs are the unknown model quantities (such as rate constants and initial conditions) which we denote by U and other variables under our control (such as the times at which time-course output is required), which we denote by X. It is often the case that we are in a position to decide at which values of the inputs (U,X) to run the simulator. The combinations of (U,X) at which the simulator is run are called the design points and choosing "good" designs for computer experiments is an active area of research. Since we would like the emulator to be a good approximation to the simulator over a wide range of values of the inputs, it is common to use space-filling designs such as those based on Latin hypercubes.

Once a design is chosen, and the simulator is run to produce output, the next stage is to fit a statistical model to this output. We adopt a Bayesian approach to the fitting of the emulator, in-keeping with our Bayesian approach for model calibration. The final stage in the process is to validate that the emulator is a good model for the simulator. One of the simplest ways to validate an emulator is to compare simulated values from the emulator to simulated values from the simulator.

Emulator-based calibration

Suppose that the underlying state of the system (which we denote by Y) depends on the values of the unknown quantities, U and is described by a probability (density) function p(Y|U). We will denote the emulator by a probability (density) function p*(Y|U). Once fitted, the emulator can be used as a surrogate for the simulator in the MCMC-based model calibration schemes. Put simply, we replace forward simulations from the simulator with forward simulations from the (faster and hopefully reasonably accurate) emulator. If the emulator has a suitably mathematically tractable form then forward simulation can be avoided and more sophisticated (but less generic) MCMC algorithms can be used for the purposes of calibration. (Recall from the page on Bayesian model calibration that simulation from the model is used only because the model is mathematically intractable.) The form of the emulator to use will depend on the type of simulator that we wish to model, either deterministic or stochastic.

Emulation of deterministic simulators

When the simulator is deterministic, we can treat it as an unknown function. We can then express our uncertainty about the values of the unknown function through a stochastic process. The desirable features of an emulator of a deterministic simulator are as follows:

1. It should pass exactly through the output generated by the simulator.

2. It should be smooth; the predicted values of the output for points close in input space should be similar.

3. It should give an indication of the uncertainty in a prediction.

4. It should be mathematically tractable and/or fast to simulate.

One class of stochastic processes which possess all these desirable properties are Gaussian processes and these are used extensively as emulators for deterministic simulators. These processes have the property that the values at any finite set of points follow a multivariate normal distribution. The following figure illustrates the use of a Gaussian process for expressing uncertainty about the values of an unknown function in a simple one-dimensional setting.

GP3.png

In the panel on the left, the red dots represent observed values of the output of the simulator, Y, when the simulator is run at three different values of the inputs (denoted here by x). The blue line is the Gaussian-process-based prediction of the output of the simulator at every possible value of the input. This gives a smooth representation of what we would expect the simulator ouput to be at the values that we haven't yet run the simulator at. The light blue shading represents the uncertainty in the prediction at each value of the input. Notice how there is no uncertainty at inputs corresponding to the observed datapoints. The uncertainty increases as we move further from the input values used to run the simulator. Suppose we are interested in the outputed value of the simulator at a new input value, say at x=4.5 - this is indicted by the vertical red line. The right panel shows the predictive distribution of the output of the simulator at this point and uncertainty about its value is described by a normal distribution (due to working with a Gaussian process). The red dot on this panel is the actual outputed value from running the simulator with an input of x=4.5 and is consistent with the predictive distribution determined via the emulator.

Emulation of stochastic simulators

A stochastic simulator gives a different realisation of the output each time it is simulated. Therefore to emulate a stochastic simulator we need to find a surrogate model which accurately represents the unknown distributions of simulator output at all (reasonable) values of the inputs. In general, this is a very difficult task. Emulation of stochastic simulators has received much less attention than that of deterministic simulators and there is no universally accepted method analagous to using Gaussian processes. An approach that we have found to work well in a number of situations is to model the distribution of output at a particular value of the inputs by a probability distribution of some standard form (e.g. normal or binomial) and then model the parameters of this distribution as smoothly varying unknown deterministic functions of the inputs. These unknown functions can in turn be modelled using Gaussian process priors in precisely the same way as above (for deterministic simulators). A natural extension of this methodology is to move away from the restrictions of a parametric model for the output towards more nonparametric alternatives (such as mixtures of distributions). Emulator fitting and validation proceeds in the usual way. However, the experimental design phase can also be extended to include a choice of the number of replications to obtain at each set of inputs as this might help in understanding the stochastic variation in the simulator output.

References for further reading

CalibayesWiki: stochastic_model_emulation (last edited 2009-04-28 09:53:41 by localhost)