logo AFST

Gaussian process modeling with inequality constraints
Annales de la Faculté des sciences de Toulouse : Mathématiques, Série 6, Tome 21 (2012) no. 3, pp. 529-555.

La modélisation par processus Gaussiens est une des approches les plus utilisées pour construire un métamodèle dans le cas de simulateurs numériques coûteux. Souvent, les sorties du code correspondent à  des quantités physiques dont le comportement est connu à  l’avance : les concentrations chimiques sont comprises entre 0 et 1, la sortie est croissante par rapport à  un des paramètres, etc. Plusieurs approches ont été proposées pour prendre en compte de telles informations. Dans cet article, nous introduisons un nouveau cadre théorique pour inclure des contraintes dans la modélisation par processus Gaussiens, qui englobe les contraintes de bornes, de monotonie et de convexité. Nous étendons également ce cadre à  tous les types de contraintes linéaires. Cette nouvelle méthodologie fait appel aux moments conditionnels de lois normales multivariées tronquées. Nous proposons plusieurs approximations basées sur une hypothèse de décorrélation, des outils d’intégration numérique et des techniques d’échantillonnage. D’un point de vue pratique, nous illustrons l’amélioration des performances de prédiction par processus Gaussiens lorque l’on inclut des contraintes. Nous comparons finalement les différents prédicteurs approchés sur des exemples avec contraintes de bornes, monotonie et convexité.

Gaussian process modeling is one of the most popular approaches for building a metamodel in the case of expensive numerical simulators. Frequently, the code outputs correspond to physical quantities with a behavior which is known a priori: Chemical concentrations lie between 0 and 1, the output is increasing with respect to some parameter, etc. Several approaches have been proposed to deal with such information. In this paper, we introduce a new framework for incorporating constraints in Gaussian process modeling, including bound, monotonicity and convexity constraints. We also extend this framework to any type of linear constraint. This new methodology mainly relies on conditional expectations of the truncated multinormal distribution. We propose several approximations based on correlation-free assumptions, numerical integration tools and sampling techniques. From a practical point of view, we illustrate how accuracy of Gaussian process predictions can be enhanced with such constraint knowledge. We finally compare all approximate predictors on bound, monotonicity and convexity examples.

Reçu le : 2012-02-26
Accepté le : 2012-04-15
Publié le : 2012-10-23
DOI : https://doi.org/10.5802/afst.1344
@article{AFST_2012_6_21_3_529_0,
     author = {S\'ebastien Da Veiga and Amandine Marrel},
     title = {Gaussian process modeling with inequality constraints},
     journal = {Annales de la Facult\'e des sciences de Toulouse : Math\'ematiques},
     pages = {529--555},
     publisher = {Universit\'e Paul Sabatier, Toulouse},
     volume = {Ser. 6, 21},
     number = {3},
     year = {2012},
     doi = {10.5802/afst.1344},
     zbl = {1279.60047},
     mrnumber = {3076411},
     language = {en},
     url = {afst.centre-mersenne.org/item/AFST_2012_6_21_3_529_0/}
}
Sébastien Da Veiga; Amandine Marrel. Gaussian process modeling with inequality constraints. Annales de la Faculté des sciences de Toulouse : Mathématiques, Série 6, Tome 21 (2012) no. 3, pp. 529-555. doi : 10.5802/afst.1344. https://afst.centre-mersenne.org/item/AFST_2012_6_21_3_529_0/

[1] Abrahamsen (P.) and Benth (F.E.).— Kriging with inequality constraints. Mathematical Geology, 33(6), p. 719-744 (2001). | MR 1956391 | Zbl 1011.86007

[2] Azaïs (J.-M.) and Wschebor (M.).— Level sets and extrema of random processes and fields. New York: Wiley (2009). | MR 2478201 | Zbl 1168.60002

[3] Bigot (J.) and Gadat (S.).— Smoothing under diffeomorphic constraints with homeomorphic splines. SIAM Journal on Numerical Analysis, 48(1), p. 224-243 (2010). | MR 2608367 | Zbl pre05858684

[4] Chopin (N.).— Fast simulation of truncated Gaussian distributions. Statistics and Computing, 21, p. 275-288 (2011). | MR 2774857

[5] Cozman (F.) and Krotkov (E.).— Truncated Gaussians as Tolerance Sets. Fifth Workshop on Artificial Intelligence and Statistics, Fort Lauderdale Florida (1995).

[6] Cramér (H.) and Leadbetter (M.R.).— Stationary and Related Stochastic Processes: Sample Function Properties and Their Applications. New York: Wiley (1967). | MR 217860 | Zbl 0162.21102

[7] Da Veiga (S.), Wahl (F.) and Gamboa (F.).— Local Polynomial Estimation for Sensitivity Analysis on Models With Correlated Inputs Technometrics, 51(4), p. 452-463 (2009). | MR 2756480

[8] Dette (H.) and Scheder (R.).— Strictly monotone and smooth nonparametric regression for two or more variables. The Canadian Journal of Statistics, 34(44), p. 535-561 (2006). | MR 2345035 | Zbl 1115.62039

[9] Ellis (N.) and Maitra (R.).— Multivariate Gaussian Simulation Outside Arbitrary Ellipsoids. Journal of Computational and Graphical Statistics, 16(3), p. 692-798 (2007). | MR 2351086

[10] Fernandez (P.J.), Ferrari (P.A.) and Grynberg (S.P.).— Perfectly random sampling of truncated multinormal distributions. Adv. in Appl. Probab., 39(4), p. 973-990 (2007). | MR 2381584 | Zbl 1137.65011

[11] Genz (A.).— Numerical Computation of Multivariate Normal Probabilities. J. Comp. Graph Stat., 1, p. 141-149 (1992). | Zbl 1204.62088

[12] Genz (A.).— Comparison of Methods for the Computation of Multivariate Normal Probabilities. Computing Science and Statistics, 25, p. 400-405 (1993).

[13] Genz (A.) and Bretz (F.).— Computation of Multivariate Normal and t Probabilities. Lecture Notes in Statistics, Vol. 195, Springer-Verlag, Heidelberg (2009). | MR 2840595 | Zbl 1204.62088

[14] Geweke (J.).— Efficient simulation from the multivariate normal and student t-distribution subject to linear constraints. Computing Science and Statistics: Proceedings of the Twenty-Third Symposium on the Interface, p. 571-578 (1991).

[15] Ginsbourger (D.), Bay (X.) and Carraro (L.).— Noyaux de covariance pour le Krigeage de fonctions symétriques. submitted to C. R. Acad. Sci. Paris, section Maths (2009).

[16] Griffiths (W.).— A Gibbs’ sampler for the parameters of a truncated multivariate normal distribution. Working Paper, http://ideas.repec.org/p/mlb/wpaper/856.html (2002). | Zbl 1084.62045

[17] Hall (P.) and Huang (L.-S.).— Nonparametric kernel regression subject to monotonicity constraints. The Annals of Statistics, 29(3), p. 624-647 (2001). | MR 1865334 | Zbl 1012.62030

[18] Hazelton (M.L.) and Turlach (B.A.).— Semiparametric regression with shape-constrained penalized slpines. Computational Statistics and Data Analysis, 55, p. 2871-2879 (2011). | MR 2811872 | Zbl 1218.62032

[19] Horrace (W.C.).— Some results on the multivariate truncated normal distribution. Journal of Multivariate Analysis, 94, p. 209-221 (2005). | MR 2161218 | Zbl 1065.62098

[20] Kleijnen (J.P.C.) and van Beers (W.C.M.).— Monotonicity-preserving bootstrapped Kriging metamodels for expensive simulations. Working Paper, http://www.tilburguniversity.edu/research/institutes-and-research-groups/center/staff/kleijnen/monotone Kriging.pdf (2010).

[21] Kotecha (J.H.) and Djuric (P.M.).— Gibbs sampling approach for generation of truncated multivariate gaussian random variables. IEEE Computer Society, p. 1757-1760 (1999).

[22] Kotz (S.), Balakrishnan (N.) and Johnson (N.L.).— Continuous multivariate distributions, Volume 1: models and applications New York: Wiley (2000). | MR 1788152 | Zbl 0946.62001

[23] Lee (L.-F.).— On the first and second moments of the truncated multi-normal distribution and a simple estimator. Economics Letters, 3, p. 165-169 (1979). | MR 554496

[24] Lee (L.-F.).— The determination of moments of the doubly truncated multivariate tobit model. Economics Letters, 11, p. 245-250 (1983).

[25] Marrel (A.), Iooss (B.), Van Dorpe (F.) and Volkova (E.).— An efficient methodology for modeling complex computer codes with Gaussian processes. Computational Statistics and Data Analysis, 52, p. 4731-4744 (2008). | MR 2521618 | Zbl pre05565053

[26] Michalak (A.M.).— A Gibbs sampler for inequality-constrained geostatistical interpolation and inverse modeling. Water Resour. Res., 44, W09437, doi:10.1029/2007WR006645 (2008).

[27] Muthén (B.).— Moments of the censored and truncated bivariate normal distribution. British Journal of Mathematical and Statistical Psychology, 43, p. 131-143 (1990). | MR 1065201 | Zbl 0718.62031

[28] Oakley (JE.) and O’Hagan (A.).— Probabilistic sensitivity analysis of complex models: A Bayesian approach. Journal of the Royal Statistical Society, Series B, 66, p. 751-769 (2004). | MR 2088780 | Zbl 1046.62027

[29] Philippe (A.) and Robert (C.).— Perfect simulation of positive Gaussian distributions. Statistics and Computing, 13(2), p. 179-186 (2003). | MR 1963334

[30] Racine (J.S.), Parmeter (C.F.) and Du (P.).— Constrained nonparametric kernel regression: Estimation and inference. Working Paper, http:/economics.ucr.edu/spring09/Racine paper for 5 8 09.pdf (2009).

[31] Ramsay (J.O.) and Silverman (B.W.).— Functional Data Analysis. Springer Series in Statistics, Springer-Verlag (2005). | MR 2168993 | Zbl 1079.62006

[32] Rasmussen (C.E.) and Williams (C.K.I.).— Gaussian Processes for Machine Learning (2006). The MIT Press. | MR 2514435 | Zbl 1177.68165

[33] Robert (C.P.).— Simulation of truncated normal variables. Statistics and Computing, 5, p. 121-125 (1995).

[34] Rodriguez-Yam (G.), Davis (R.A.) and Scharf (L.).— Efficient Gibbs Sampling of Truncated Multivariate Normal with Application to Constrained Linear Regression. Working Paper, http://www.stat.columbia.edu/ rdavis/papers/CLR.pdf (2004).

[35] Sacks (J.), Welch (W.), Mitchell (T.) and Wynn (H.).— Design and analysis of computer experiments. Statistical Science, 4, p. 409-435 (1989). | MR 1041765 | Zbl 0955.62619

[36] Saltelli (A.), Chan (K.) and Scott (E.M.) (Eds.).— Sensitivity Analysis. Wiley (2000). | MR 1886391 | Zbl 0961.62091

[37] Santner (T.), Williams (B.) and Notz (W.).— The design and analysis of computer experiments. Springer (2003). | MR 2160708 | Zbl 1041.62068

[38] Tallis (G.M.).— The moment generating function of the truncated multinormal distribution. Journal of the Royal Statistical Society, Series B, 23(1), p. 223-229 (1961). | MR 124077 | Zbl 0107.14206

[39] Tallis (G.M.).— Elliptical and radial truncation in normal populations. Ann. Math. Statist., 34, p. 940-944 (1963). | MR 152081 | Zbl 0142.16104

[40] Tallis (G.M.).— Plane truncation in normal populations. Journal of the Royal Statistical Society, Series B, 27(2), p. 301-307 (1965). | MR 198522 | Zbl 0137.36602

[41] Yoo (E.-H.) and Kyriakidis (P.C.).— Area-to-point Kriging with inequality-type data. Journal of Geographical Systems, 8(4), p. 357 (2006).