About Bayes Comp

Bayes Comp is a biennial conference sponsored by the ISBA section of the same name. The conference and the section both aim to promote original research into computational methods for inference and decision making and to encourage the use of frontier computational tools among practitioners, the development of adapted software, languages, platforms, and dedicated machines, and to translate and disseminate methods developed in other disciplines among statisticians.

Bayes Comp is the current incarnation of the popular MCMSki series of conferences, and Bayes Comp 2020 is the second edition of this new conference series. The first edition was Bayes Comp 2018, which was held in Barcelona in March of 2018.

Where & When

Bayes Comp 2020 will take place in the Reitz Union at the University of Florida. It will start in the afternoon on Tuesday, January 7 (2020) and finish in the afternoon on Friday, January 10.


  • Deadline for submission of poster proposals: December 15, 2019.

  • Provide the name and affiliation of the speaker, as well as a title and an abstract for the poster. If the poster is associated with a technical report or publication, please also provide that information. Acceptance is conditional on registration, and decisions will be made on-the-fly, usually within a week of submission. Email your proposal to Christian Robert.

  • Deadline for applications for travel support: September 20, 2019. (Scroll down for details.)


Fees (in US$)

Early (through Aug 14)

Regular (Aug 15 - Oct 14)

Late (starting Oct 15)

Student Member of ISBA




Student Non-member of ISBA




Regular Member of ISBA




Regular Non-Member of ISBA




Please note that:

  • The fees are structured so that a non-member of ISBA will save money by joining ISBA before registering.
  • The registration fee does not include the conference dinner. (The cost of the conference dinner is $50.)

Click here to register!

Travel Support

There are funds available for junior travel support. These funds are earmarked for people who are either currently enrolled in a PhD program, or have earned a PhD within the last three years (no earlier than January 1, 2017). To be eligible for funding, you must be presenting (talk or poster), and be registered for the conference.

Applicants should email the following two items to Jim Hobert: (1) An up-to-date CV, and (2) proof of current enrollment in a PhD program (in the form of a short letter from PhD advisor), or a PhD certificate featuring the graduation date. The application deadline is September 20, 2019.


Blocks of rooms have been reserved at three different hotels:


Plenary Speakers

Invited Sessions

  • Theory & practice of HMC (and its variants) for Bayesian hierarchical models : Tamara Broderick (MIT), George Deligiannidis (U of Oxford), Aaron Smith (U of Ottawa).
    • Tamara Broderick: The kernel interaction trick: Fast Bayesian discovery of multi-way interactions in high dimensions

      Abstract: TBA

    • George Deligiannidis: The bouncy particle sampler and randomized Hamiltonian Monte Carlo

      Abstract: TBA

    • Aaron Smith: Free lunches and subsampling Monte Carlo

      Abstract: TBA

  • Scalable methods for high-dimensional problems : Akihiko Nishimura (UCLA), Anirban Bhattacharya (Texas A&M), Sara Wade (U of Edinburgh).
    • Akihiko Nishimura: Scalable Bayesian sparse generalized linear models and survival analysis via curvature-adaptive Hamiltonian Monte Carlo for high-dimensional log-concave distributions

      Abstract: Bayesian sparse regression based on shrinkage priors possess many desirable theoretical properties and yield posterior distributions whose conditionals mostly admit straightforward Gibbs updates. Sampling high-dimensional regression coefficients from its conditional distribution, however, presents a major scalability issue in posterior computation. The conditional distribution generally does not belong to a parametric family and the existing sampling approaches are hopelessly inefficient in high-dimensional settings. Inspired by recent advances in understanding the performance of Hamiltonian Monte Carlo (HMC) on log-concave target distributions, we develop *curvature-adaptive HMC* for scalable posterior inference under sparse regression models with log-concave likelihoods. As is well-known, HMC's performance critically depends on the integrator stepsize and mass matrix. These tuning parameters are typically adjusted over many HMC iterations by collecting statistics on the target distribution --- an impractical approach when employing HMC within a Gibbs sampler since the conditional distribution changes as the other parameters are updated. Instead, we achieve on-the-fly calibration of the key HMC tuning parameters through 1) the recently developed theory of *prior-preconditioning* for sparse regression and 2) a rapid estimation of the curvature of a given log-concave target via *iterative methods* from numerical linear algebra. We demonstrate the scalability of our method on a clinically relevant large-scale observational study involving n >= 80,000 patients and p >= 10,000 predictors, designed to assess the relative efficacy of two alternative hypertension treatments.

    • Anirban Bhattacharya: Approximate MCMC for high-dimensional estimation

      Abstract: We discuss a number of applications of approximate MCMC to complex high-dimensional structured estimation problems. A unified theoretical treatment is provided to understand the impact of introducing approximations to the exact MCMC transition kernel.

    • Sara Wade: Posterior inference for sparse hierarchical non-stationary models

      Abstract: Gaussian processes are valuable tools for non-parametric modelling, where typically an assumption of stationarity is employed. While removing this assumption can improve prediction, fitting such models is challenging. In this work, hierarchical models are constructed based on Gaussian Markov random fields with stochastic spatially varying parameters. Importantly, this allows for non-stationarity while also addressing the computational burden through a sparse representation of the precision matrix. The prior field is chosen to be Matérn, and two hyperpriors, for the spatially varying parameters, are considered. One hyperprior is Ornstein-Uhlenbeck, formulated through an autoregressive process. The other corresponds to the widely used squared exponential. In this setting, efficient Markov chain Monte Carlo (MCMC) sampling is challenging due to the strong coupling a posteriori of the parameters and hyperparameters. We develop and compare three MCMC schemes, which are adaptive and therefore free of parameter tuning. Furthermore, a novel extension to higher-dimensional settings is proposed through an additive structure that retains the flexibility and scalability of the model, while also inheriting interpretability from the additive approach. A thorough assessment of the ability of the methods to efficiently explore the posterior distribution and to account for non-stationarity is presented, in both simulated experiments and a real-world computer emulation problem. https://arxiv.org/abs/1804.01431

  • MCMC and scalable Bayesian computations : Philippe Gagnon (U of Oxford), Florian Maire (U de Montréal), Giacomo Zanella (Bocconi U).
    • Philippe Gagnon: Nonreversible jump algorithms for nested models

      Abstract: TBA

    • Florian Maire: Can we improve convergence of MCMC methods by aggregating Markov kernels in a locally informed way?

      Abstract: TBA

    • Giacomo Zanella: On the robustness of gradient-based sampling algorithms

      Abstract: TBA

  • Scalable methods for posterior inference from big data : Subharup Guha (U of Florida), Zhenyu Zhang (UCLA), David Dahl (Brigham Young U).
    • Subharup Guha: Fast MCMC techniques for fitting Bayesian mixture models to massive multiple-platform cancer data

      Abstract: Recent advances in array-based and next-generation sequencing technologies have revolutionized biomedical research, especially in cancer. Bayesian mixture models, such as finite mixtures, hidden Markov models, and Dirichlet processes, offer elegant frameworks for inference, especially because they are flexible, avoid making unrealistic assumptions about the data features and the nature of the interactions, and permit nonlinear dependencies. However, existing inference procedures for these models do not scale to multiple-platform Big Data and often stretch computational resources past their limits. An investigation of the theoretical properties of these models offers insight into asymptotics that form the basis of broadly applicable, cost-effective MCMC strategies for large datasets. These MCMC techniques have the advantage of providing inferences from the posterior of interest, rather than an approximation, and are applicable to different Bayesian mixture models. Furthermore, they can be applied to develop massively parallel MCMC algorithms for these data. The versatility and impressive gains of the methodology are demonstrated by simulation studies and by a semiparametric integrative analysis that detects shared biological mechanisms in heterogeneous multi-platform cancer datasets. (Joint with Dongyan Yan and Veera Baladandayuthapani.)

    • Zhenyu Zhang: Bayesian inference for large-scale phylogenetic multivariate probit models

      Abstract: Inferring correlation among biological features is an important yet challenging problem in evolutionary biology. In addition to adjusting for correlations induced from an uncertain evolutionary history, we also have to deal with features measured in different scales: continuous and binary. We jointly model the two feature types by introducing latent continuous parameters for binary features, giving rise to a phylogenetic multivariate probit model. Posterior computation under this model remains problematic with increasing sample size, requiring repeatedly sampling from a high-dimensional truncated Gaussian distribution. Best current approaches scale quadratically in sample size and suffer from slow-mixing. We develop a new computation approach that exploits 1) the state-of-the-art bouncy particle sampler based on piece-wise deterministic Markov process and 2) a novel dynamic programming approach that reduces the cost of likelihood and gradient evaluations to linear in sample size. In an application, we successfully handle a 14,980-dimensional truncated Gaussian, making it possible to estimate correlations among 28 HIV virulence and immunological epitope features across 535 viruses. The proposed approach is of independent interest, being applicable to a broader class of covariance structures beyond comparative biology. (Joint with Akihiko Nishimura, Philippe Lemey, and Marc A. Suchard.)

    • David Dahl: Summarizing distributions of latent structure

      Abstract: In a typical Bayesian analysis, considerable effort is placed on "fitting the model" (e.g., obtaining samples from the posterior distribution) but this is only half of the inference problem. Meaningful inference usually requires summarizing the posterior distribution of the parameters of interest. Posterior summaries can be especially important in communicating the results and conclusions from a Bayesian analysis to a diverse audience. If the parameters of interest live in R^n, common posterior summaries are means, medians, and modes. Summarizing posterior distributions of parameters with complicated structure is a more difficult problem. For example, the "average" network in the posterior distribution on a network is not easily defined. This paper reviews methods for summarizing distributions of latent structure and then proposes a novel search algorithm for posterior summaries. We apply our method to distributions on variable selection indicators, partitions, feature allocations, and networks. We illustrate our approach in a variety of models for both simulated and real datasets. (Joint with Peter Müller.)

  • Efficient computing strategies for high-dimensional problems : Gareth Roberts (U of Warwick), Veronika Rockova (U of Chicago), Gregor Kastner (Vienna U of Economics and Business).
    • Gareth Roberts: TBA

      Abstract: TBA

    • Veronika Rockova: Fast posterior sampling for the spike-and-slab lasso

      Abstract: TBA

    • Gregor Kastner: Efficient Bayesian computing in many dimensions - applications in economics and finance

      Abstract: TBA

  • MCMC methods in high dimension, theory and applications: Christophe Andrieu (U of Bristol), Gabriel Stoltz (Ecole des Ponts ParisTech), Umut Simsekli (Télécom ParisTech), Gersende Fort (CNRS, Institut de Mathématiques de Toulouse).
    • Christophe Andrieu: TBA

      Abstract: TBA

    • Gabriel Stoltz: TBA

      Abstract: TBA

    • Umut Simsekli: TBA

      Abstract: TBA

    • Gersende Fort: TBA

      Abstract: TBA

  • Computational advancements in entity resolution : Brenda Betancourt (U of Florida), Andee Kaplan (Duke U), Rebecca Steorts (Duke U).
    • Brenda Betancourt: Generalized flexible microclustering models for entity resolution

      Abstract: Classical clustering tasks accomplished with Bayesian random partition models seek to divide a given population or data set in a relatively small number of clusters whose size grows with the number of data points. For other clustering applications, such as entity resolution, this assumption is inappropriate. Entity resolution (record linkage or de-duplication) is the process of removing duplicate records from noisy databases often in the absence of a unique identifier. One natural approach to entity resolution is as a clustering problem, where each entity is implicitly associated with one or more records and the inference goal is to recover the latent entities (clusters) that correspond to the observed records (data points). In most entity resolution tasks, the clusters are very small and remain small as the number of records increases. This framework requires models that yield clusters whose sizes grow sublinearly with the total number of data points. We introduce a general class of microclustering models suitable for the 'microclustering' problem, and fully characterize its theoretical properties and asymptotic behavior. We also present a partially-collapsed MCMC sampler that, compared to common sampling schemes found in the literature, achieves a significantly better mixing by overcoming strong dependencies between some of the parameters in the model. To improve scalability, we combine the sampling algorithm with a common record linkage blocking technique that allows for parallel programing. (Joint with Giacomo Zanella and Rebecca Steorts.)

    • Andee Kaplan: Life after record linkage: Tackling the downstream task with error propagation

      Abstract: Record linkage (entity resolution or de-duplication) is the process of merging noisy databases to remove duplicate entities that often lack a unique identifier. Linking data from multiple databases increases both the size and scope of a dataset, enabling post-processing tasks such as linear regression or capture-recapture to be performed. Any inferential or predictive task performed after linkage can be considered as the "downstream task." While recent advances have been made to improve flexibility and accuracy of record linkage, there are limitations in the downstream task due to the passage of errors through this two-step process. In this talk, I present a generalized framework for creating a representative dataset post-record linkage for the downstream task, called prototyping. Given the information about the representative records, I explore two downstream tasks—linear regression and binary classification via logistic regression. In addition, I discuss how error propagation occurs in both of these settings. I provide thorough empirical studies for the proposed methodology, and conclude with a discussion of practical insights into my work. (Joint with Brenda Betancourt and Rebecca Steorts.)

    • Rebecca Steorts: Scalable end-to-end Bayesian entity resolution

      Abstract: Very often information about social entities is scattered across multiple databases. Combining that information into one database can result in enormous benefits for analysis, resulting in richer and more reliable conclusions. In most practical applications, however, analysts cannot simply link records across databases based on unique identifiers, such as social security numbers, either because they are not a part of some databases or are not available due to privacy concerns. In such cases, analysts need to use methods from statistical and computational science known as entity resolution (record linkage or de-duplication) to proceed with analysis. Entity resolution is not only a crucial task for social science and industrial applications, but is a challenging statistical and computational problem itself. One recent development in entity resolution methodology has been the application of Bayesian generative models. These models offer several advantages over conventional methods, namely: (i) they do not require labeled training data; (ii) they treat linkage as a clustering problem which preserves transitivity; (iii) they propagate uncertainty; and (iv) they allow for flexible modeling assumptions. However, due to difficulties in scaling, these models have so far been limited to small data sets of around 1000 records. In this talk, I propose the first scalable Bayesian models for entity resolution. This extension brings together several key ideas, including probabilistic blocking, indexing, and efficient sampling algorithms. The proposed methodology is illustrate on both synthetic and real data. (Joint with Neil Marchant, Benjamin Rubinstein, Andee Kaplan, and Daniel Elazar.)

  • ABC : Ruth Baker (U of Oxford), David Frazier (Monash U), Umberto Picchini (Chalmers U of Tech & U of Gothenburg).
    • Ruth Baker: Multifidelity approximate Bayesian computation

      Abstract: A vital stage in the mathematical modelling of real-world systems is to calibrate a model's parameters to observed data. Likelihood-free parameter inference methods, such as Approximate Bayesian Computation, build Monte Carlo samples of the uncertain parameter distribution by comparing the data with large numbers of model simulations. However, the computational expense of generating these simulations forms a significant bottleneck in the practical application of such methods. We identify how simulations of cheap, low-fidelity models have been used separately in two complementary ways to reduce the computational expense of building these samples, at the cost of introducing additional variance to the resulting parameter estimates. We explore how these approaches can be unified so that cost and benefit are optimally balanced, and we characterise the optimal choice of how often to simulate from cheap, low-fidelity models in place of expensive, high-fidelity models in Monte Carlo ABC algorithms. The resulting early accept/reject multifidelity ABC algorithm that we propose is shown to give improved performance over existing multifidelity and high-fidelity approaches.

    • David Frazier: Robust approximate Bayesian inference with synthetic likelihood

      Abstract: Bayesian synthetic likelihood (BSL) is now a well-established method for conducting approximate Bayesian inference in complex models where exact Bayesian approaches are either infeasible, or computationally demanding, due to the intractability of likelihood function. Similar to other approximate Bayesian methods, such as the method of approximate Bayesian computation, implicit in the application of BSL is the maintained assumption that the data generating process can generate simulated summary statistics that mimic the behaviour of the observed summary statistics. This notion of model compatibility with the observed summaries is critical for the performance of BSL and its variants. We demonstrate theoretically, and through several examples, that if the assumed data generating process (DGP) differs from the true DGP, model compatibility may no longer be satisfied and BSL can give unreliable inferences. To circumvent the issue of incompatibility between the observed and simulated summary statistics, we propose two robust versions of BSL that can deliver reliable performance regardless of whether or not the observed and simulated summaries are compatible. Simulation results and two empirical examples demonstrate the good performance of this robust approach to BSL, and its superiority over standard BSL when model compatibility is not in evidence.

    • Umberto Picchini: Variance reduction for fast ABC using resampling

      Abstract: Approximate Bayesian computation (ABC) is the state-of-art methodology for likelihood-free Bayesian inference. Its main feature is the ability to bypass the explicit calculation of the likelihood function, by only requiring access to a model simulator to generate many artificial datasets. In the context of pseudo-marginal ABC-MCMC (Bornn, Pillai, Smith and Woodard, 2017), generating $M> 1$ datasets for each MCMC iteration allows to construct a kernel-smoothed ABC likelihood which has lower variance, this resulting beneficial for the mixing of the ABC-MCMC chain, compared to the typical ABC setup which sets $M=1$. However, setting $M>1$ implies a computational bottleneck, and in Bornn, Pillai, Smith and Woodard (2017) it was found that the benefits of using $M>1$ are not worth the increasing computational effort. In Everitt (2017) it was shown that, when the intractable likelihood is replaced by a \textit{synthetic likelihood} (SL, Wood, 2010), it is possible to use $M=1$ and resample many times from this single simulated dataset, to construct computationally fast SL inference that artificially emulates the case $M>1$. Unfortunately, this approach was found to be ineffective within ABC, as the resampling generates inflated ABC posteriors. In this talk we show how to couple \textit{stratified sampling} with the resampling idea of Everitt (2017). We construct an ABC-MCMC algorithm that uses a small number of model simulations ($M=1$ or 2) for each MCMC iteration, while substantially reducing the additional variance in the approximate posterior distribution induced by resampling. We therefore enjoy the computational speedup from resampling approaches, and show that our stratified sampling procedure allows us to use a larger than usual ABC threshold, while still obtaining accurate inference. (Joint with Richard Everitt.)

  • Continuous-time and reversible Monte Carlo methods : Yian Ma (U of California, Berkeley), Manon Michel (U Clermont-Auvergne), Daniel Paulin (U of Oxford).
    • Yian Ma: Bridging MCMC and Optimization

      Abstract: Rapid growth in data size and model complexity has boosted questions on how computational tools can scale with the problem and data complexity. Optimization algorithms have had tremendous success for convex problems in this regard. MCMC algorithms for mean estimates, on the other hand, are slower than the optimization algorithms in convex unconstrained scenarios. It has even become folklore that the MCMC algorithms are in general computationally more intractable than optimization algorithms. In this talk, I will examine a class of non-convex objective functions arising from mixture models. For that class of objective functions, I discover that the computational complexity of MCMC algorithms scales linearly with the model dimension, while optimization problems are NP hard. I will then study MCMC algorithms as optimization over the KL-divergence in the space of measures. By incorporating a momentum variable, I will discuss an algorithm which performs accelerated gradient descent over the KL-divergence. Using optimization-like ideas, a suitable Lyapunov function is constructed to prove that an accelerated convergence rate is obtained.

    • Manon Michel: Accelerations of MCMC methods by non-reversibility and factorization

      Abstract: During this talk, I will present the historical development of non-reversible Markov-chain Monte Carlo methods, based on piecewise deterministic Markov processes (PDMP). First developed for multiparticle systems, the goal was to emulate the successes of cluster algorithms for spin systems and was achieved through the replacement of the time reversibility by symmetries of the sampled probability distribution itself. These methods have shown to bring clear accelerations and are now competing with molecular dynamics methods in chemical physics or state-of-the-art sampling schemes, e.g. Hamiltonian Monte Carlo, in statistical inference. I will discuss their successes as well as the remaining open questions. Finally, I will explain how the factorization of the distribution can lead to computational complexity reduction.

    • Daniel Paulin: Connections between PDMPs and Hamiltonian Monte Carlo

      Abstract: In this talk we are going to explore some connections between Piecewise Deterministic Markov Processes and Hamiltonian Monte Carlo in high dimensions.

  • Markov chain convergence analysis and Wasserstein distance: Alain Durmus (ENS Paris-Saclay), Jonathan Mattingly (Duke U), Qian Qin (U of Florida).
    • Alain Durmus: TBA

      Abstract: TBA

    • Jonathan Mattingly: TBA

      Abstract: TBA

    • Qian Qin: TBA

      Abstract: TBA

  • Young researchers' contributions to Bayesian computation: Tommaso Rigon (Bocconi U), Michael Jauch (Duke U), Nicholas Tawn (U of Warwick).
    • Tommaso Rigon: Bayesian inference for finite-dimensional discrete priors

      Abstract: Discrete random probability measures are the main ingredient for addressing Bayesian clustering. The investigation in this area has been very lively, with strong emphasis on nonparametric procedures based either on the Dirichlet process or on more flexible generalizations, such as the Pitman-Yor (PY) process or the normalized random measures with independent increments (NRMI). The literature on finite-dimensional discrete priors, beyond the classic Dirichlet-multinomial model, is much more limited. We aim at filling this gap by introducing novel classes of priors closely related to the PY process and NRMIs, which are recovered as limiting case. Prior and posterior distributional properties are extensively studied. Specifically, we identify the induced random partitions and determine explicit expressions of the associated urn schemes and of the posterior distributions. A detailed comparison with the (infinite-dimensional) PY and NRMIs is provided. Finally, we employ our proposal for mixture modeling, and we assess its performance over existing methods in the analysis of a real dataset.

    • Michael Jauch: Bayesian analysis with orthogonal matrix parameters

      Abstract: Statistical models for multivariate data are often parametrized by a set of orthogonal matrices. Bayesian analyses of models with orthogonal matrix parameters present two major challenges: posterior simulation on the constrained parameter space and incorporation of prior information such as sparsity or row dependence. We propose methodology to address both of these challenges. To simulate from posterior distributions defined on a set of orthogonal matrices, we propose polar parameter expansion, a parameter expanded Markov chain Monte Carlo approach suitable for routine and flexible posterior inference in standard simulation software. To incorporate prior information, we introduce prior distributions for orthogonal matrix parameters constructed via the polar decomposition of an unconstrained random matrix. Prior distributions constructed in this way satisfy a number of appealing properties and posterior inference can again be carried out in standard simulation software. We illustrate these techniques by fitting Bayesian models for a protein interaction network and gene expression data.

    • Nicholas Tawn: The Annealed Leap Point Sampler (ALPS) for multimodal target distributions

      Abstract: This talk introduces a novel algorithm, ALPS, that is designed to provide a scalable approach to sampling from multimodal target distributions. The ALPS algorithm concatenates a number of the strengths of the current gold standard approaches for multimodality. It is strongly based around the well known parallel tempering procedure but rather than using “hot state” tempering levels the ALPS algorithm instead appeals to annealing. In annealed temperature levels the modes become even more isolated with the effects of modal skew less pronounced. Indeed the more annealed the temperature the more accurately the local mode is approximated by a Laplace approximation. The idea is to exploit this by utilizing a powerful Gaussian mixture independence sampler at the annealed temperature levels allowing rapid mixing between modes. This mixing information is then filtered back to the target of interest using a parallel tempering-like procedure with carefully designed marginal distributions.

  • Approximate Bayesian nonparametrics : Peter Müller (U of Texas), Debdeep Pati (Texas A&M), Bernardo Nipoti (Trinity College Dublin).
    • Peter Müller: Approximate inference for matrix factorization in disease discovery with EHR data

      Abstract: TBA

    • Debdeep Pati: TBA

      Abstract: TBA

    • Bernardo Nipoti: TBA

      Abstract: TBA

Contributed Sessions

  • Novel mixture-based computational approaches to Bayesian learning: Michele Guindani (U of California, Irvine), Antonietta Mira (U della Svizzera Italiana & U of Insubria), Sonia Petrone (Bocconi U).
    • Michele Guindani: Modeling human microbiome data via latent nested nonparametric priors

      Abstract: The study of the human microbiome has gained substantial attention in recent years due to its relationship with the regulation of the autoimmune system. During the data-preprocessing pipeline, microbes characterized by similar genome are grouped together in Operational Taxonomic Units (OTUs). Since OTU abundances vary widely across individuals within a population, it is of interest to characterize the diversity of the microbiome to study the association between asymmetries in the human microbiota and various diseases. Here, we propose a Bayesian Nonparametric approach to model abundance tables in presence of multiple populations: a common set of parameters (atoms at the observational level) is used to construct, at a higher level, a set of atoms on a distributional space. Using a common set of atoms at the lower level yields an important advantage: our model does not degenerate to the full exchangeable case when there are ties across samples, thus overcoming the crucial problem of the traditional Nested Dirichlet process outlined by Camerlenghi et al. (2018). To perform posterior inference, we propose a novel Nested independent slice-efficient algorithm. Since OTUs tables consist of frequency counts and are known to be sparse, we express the likelihood as a Rounded Mixture of Gaussian Kernels. Simulation studies confirm that our model does not suffer the nDPMM drawback anymore, and first applications to the microbiomes of Bangladesh babies have shown promising results.

    • Antonietta Mira: Adaptive incremental mixture Markov chain Monte Carlo

      Abstract: We propose Adaptive Incremental Mixture Markov chain Monte Carlo (AIMM), a novel approach to sample from challenging probability distributions defined on a general state-space. While adaptive MCMC methods usually update a parametric proposal kernel with a global rule, AIMM locally adapts a semiparametric kernel. AIMM is based on an independent Metropolis-Hastings proposal distribution which takes the form of a finite mixture of Gaussian distributions. Central to this approach is the idea that the proposal distribution adapts to the target by locally adding a mixture component when the discrepancy between the proposal mixture and the target is deemed to be too large. As a result, the number of components in the mixture proposal is not fixed in advance. Theoretically, we prove that there exists a process that can be made arbitrarily close to AIMM and that converges to the correct target distribution. We also illustrate that it performs well in practice in a variety of challenging situations, including high-dimensional and multimodal target distributions.

    • Sonia Petrone: Quasi-Bayes properties of a procedure for sequential learning in mixture models

      Abstract: Bayesian methods are often optimal, yet nowadays pressure for fast computations, especially with streaming data and online learning, brings renewed interest in faster, although possibly sub-optimal, solutions. To what extent these algorithms approximate a Bayesian solution is a problem of interest, not always solved. On this background, we revisit a sequential procedure proposed by Smith and Makov (1978) for unsupervised learning in finite mixtures, and developed by Newton and Zhang (1999) for nonparametric mixtures. The so-called Newton's algorithm is simple and fast, and theoretically intriguing. Although originally proposed as an approximation of the Bayesian solution, its quasi-Bayes properties remain unclear. We propose a novel methodological approach. We regard the algorithm as a probabilistic learning rule, that implicitly defines an underlying probabilistic model; and we find such model. We can then prove that it is, asymptotically, a Bayesian, exchangeable mixture model. Moreover, while the algorithm only offers a point estimate, we can obtain the asymptotic posterior distribution and asymptotic credible intervals for the mixing distribution. We also provide hints for tuning the algorithm and obtaining desirable properties, as we illustrate in a simulation study. Beyond mixture models, our study suggests a theoretical framework of interest for recursive quasi-Bayes methods in other settings.

  • Using Bayesian methods to uncover the latent structures in real datasets: Louis Raynal (U of Montpellier & Harvard U), Francesco Denti (U of Milan – Bicocca & U della Svizzera Italiana), Alex Rodriguez (International Center for Theoretical Physics).
    • Louis Raynal: Reconstructing the evolutionary history of the desert locust by means of ABC random forest

      Abstract: The Approximate Bayesian Computation - Random Forest (ABC-RF) method- ology recently developed to perform model choice (Pudlo et al., 2016; Estoup et al., 2018) and parameter inference (Raynal et al., 2019). It proved to achieve good performance, is mostly insensitive to noise variables and requires very few calibration. In this presentation we expose recent improvements, with a focus on the computation of error measures with random forests for parameter in- ference. As a case study, we are interested in the Schistocerca gregaria desert locust species which is divided in two distinct regions along the north-south axis of Africa. Using ABC-RF on microsatellite data, we reconstruct the evolu- tionary processes explaining the present geographical distribution and estimate parameters as the divergence time between the north and south sub-species.

    • Francesco Denti: Bayesian nonparametric dimensionality reduction via estimation of data intrinsic dimensions

      Abstract: Even if they are defined on a space with a large dimension, data points usually lie onto hypersurfaces with a much smaller intrinsic dimension (ID). The recent Hidalgo method (Allegra et al., 2019), a Bayesian extension of the TWO-NN model (Facco et al., 2017, Scientific Report), allows estimating the ID when all points lie onto multiple latent manifolds. We consider the data points as a configuration of a Poisson Process (PP) with an intensity proportional to the true density. Hidalgo makes only two weak assumptions: (i) locally, on the scale of the second nearest neighbor, the original PP can be well approximated by a homogeneous one and (ii) points close to each other are more likely to belong to the same manifold. Under (i), the ratio of the distances of a point from its first and second neighbor follows a Pareto distribution that depends parametrically only on the ID. We extended Hidalgo to the Nonparametric case, allowing the estimation of the number of latent manifolds via Dirichlet Process Mixture Model and inducing a clustering among observations characterized by similar ID. We further derive the distributions of the ratios of subsequent distances between neighbors and we prove their independence. This enables us to extract more information from the data without compromising the scalability of our method. While the idea behind the extension is simple, a non-trivial Bayesian scheme is required for estimating the model and assigning each point to the correct manifold. Since the posterior distribution has no closed form, to sample from it we rely on the Slice Sampler algorithm. From preliminary analyses performed on simulated data, the model provides promising results. Moreover, we were able to uncover a surprising ID variability in several real-world datasets.

    • Alex Rodriguez: Mapping the topography of complex datasets

      Abstract: Data sets can be considered an ensemble of realizations drawn from a density distribution. Obtaining a synthetic description of this distribution allows rationalizing the underlying generating process and building human-readable models. In simple cases, visualizing the distribution in a suitable low-dimensional projection is enough to capture its main features but real world data sets are often embedded in a high-dimensional space. Therefore, I present a procedure that allows obtaining such a synthetic description in an automatic way with the only information of pairwise data distances (or similarities). This methodology is based on a reliable estimation of the intrinsic dimension of the dataset (Facco, et al., 2017) and the probability density function (Rodriguez, et al., 2018) coupled with a modified Density Peaks clustering algorithm (Rodriguez and Laio, 2014). The final outcome of all this machinery working together is a hierarchical tree that summarizes the main features of the data set and a classification of the data that maps to which of these features they belong to (d'Errico, et al., 2018).

  • MCMC-based Bayesian inference on Hilbert spaces: Nawaf Bou-Rabee (Rutgers U), Nathan Glatt-Holtz (Tulane U), Daniel Sanz-Alonso (U of Chicago)
    • Nawaf Bou-Rabee: TBA

      Abstract: TBA

    • Nathan Glatt-Holtz: TBA

      Abstract: TBA

    • Daniel Sanz-Alonso: TBA

      Abstract: TBA

  • Advances in multiple importance sampling: Art Owen (Stanford U), Victor Elvira (IMT Lille Douai), Felipe Medina Aguayo (U of Reading).
    • Art Owen: Robust deterministic weighting of estimates from adaptive importance sampling

      Abstract: This talk presents a simple robust way to weight a sequence of estimates generated by adaptive importance sam- pling. Importance sampling is a useful method for estimating rare event probabilities and for sampling posterior distributions. It often generates data that can be used to find an improved sampler leading to methods of adaptive importance sampling (AIS). Under ideal conditions, AIS can approach a perfect sampler and the mean squared error (MSE) vanishes exponentially fast. Under less ideal conditions, including all nontrivial uses of self-normalized importance sampling, the MSE is bounded below by a positive multiple of $1/n$. That rules out exponential convergence but still allows for steady improvements. If we model steady improvement as yielding a sequence of unbiased and uncorrelated estimates with variance proportional to $k^{−y}$ for $1 \le k \le K < \infty$ and $0 \le y \le 1$, then a simple model weighting the $k$th iterate proportionally to $k^{1/2} is nearly optimal. It never raises variance by more than 9/8 over an oracle’s variance even though the resulting convergence rate varies with $y$. Numerical investigation shows that these weights are also robust under additional models of gradual improvement. (This is joint work with Yi Zhou.)

    • Victor Elvira: Multiple importance sampling for rare events estimation with an application in communication systems

      Abstract: Digital communications are based on the transmission of symbols that belong to a finite alphabet, each of them carrying one or several bits of information. The receiver estimates the symbol that was transmitted, and in the case of perfect communication without errors, the original sequence of bits is reconstructed. However, real-world communication systems (e.g., in wireless communications) introduce random distortions in the symbols, including additive Gaussian noise, provoking errors in the detected symbols at the receiver. The characterization of the symbol error rate (SER) of the system is of major interest in communications engineering. However, in many systems of interest, the integrals required to evaluate the symbol error rate (SER) in the presence of Gaussian noise are impossible to compute in closed-form, and therefore Monte Carlo simulation is typically used to estimate the SER. Naive Monte Carlo simulation has been traditionally used in the communications literature, even if it can be very inefficient and require very long simulation runs, especially at high signal-to-noise-ratio (SNR) scenarios. At high SNR, the variance of the additive Gaussian noise is small, and hence the rate of errors is very low, which yields raw Monte Carlo impracticable for this rare event estimation problem. In this talk, we start describing (for non-experts) the problem of SER estimation of communication system. Then, we adapt a recently proposed multiple importance sampling (MIS) technique, called ALOE (for "At Least One rare Event") to this problem. Conditioned to a transmitted symbol, an error (or rare event) occurs when the observation falls in a union of half-spaces or, equivalently, outside a given polytope. The proposal distribution for ALOE samples the system conditionally on an error taking place, which makes it more efficient than other importance sampling techniques. ALOE provides unbiased SER estimates with simulation times orders of magnitude shorter than conventional Monte Carlo. Then, we discuss the challenges of SER estimation in multiple-input multiple-output (MIMO) communications, where the rare-event estimation problem requires solving a large number of integrals in a higher-dimensional space. We propose a novel MIS-based approach exploiting the strengths of the ALOE estimator.

    • Felipe Medina Aguayo: Revisiting balance heuristic with intractable proposals

      Abstract: Among the different flavours of multiple importance sampling, the celebrated balance heuristic (BH) from Veach and Guibas still remains a popular choice for estimating integrals. The basic ingredients in BH are: a set of proposals $q_l$ , indexed by some discrete label $l$, and a deterministic set of weights for these labels. However, in some scenarios sampling from $q_l$ is only achieved by sampling jointly with the label $l$; this commonly leads to a joint density whose conditionals and marginals are unavailable or expensive to compute. Despite BH being valid even if the labels are sampled randomly, the intractability of the joint proposal can be problematic, especially when the number of discrete labels is much larger than the number of permitted importance points. In this talk, we first revisit balance heuristic from an extended-space angle, which allows the introduction of intermediate distributions as in annealing importance sampling for variance reduction. We then look at estimating integrals when the proposal is only available in a joint form via a combination of correlated estimators. This idea also fits into the extended-space representation which will, in turn, provide other interesting solutions. (This is joint work with Richard Everitt, U of Reading.)

  • Simulation in path space: Moritz Schauer (Leiden U), Frank van der Meulen (TU Delft), Andrew Duncan (Imperial College London).
    • Moritz Schauer: Sampling conditional jump diffusions

      Abstract: TBA

    • Frank van der Meulen: Diffusion bridge simulation in geometric statistics

      Abstract: TBA

    • Andrew Duncan: Infinite dimensional piecewise deterministic Monte Carlo

      Abstract: TBA

  • Sequential Monte Carlo: Recent advances in theory and practice: Richard Everitt (U of Reading), Liangliang Wang (Simon Fraser U), Anthony Lee (U of Bristol).
    • Richard Everitt: Evolution with recombination using state-of-the-art computational methods

      Abstract: Recombination is a critical process in evolutionary inference, particularly when analysing within-species variation. In bacteria, despite being organisms that reproduce clonally, recombination commonly occurs when a donor cell contributes a small segment of its DNA. This process is typically modelled using an ancestral recombination graph (ARG), which is a generalisation of the coalescent. The ClonalOrigin model ([Didelot et al. 2010]) can be regarded as a good approximation of the aforementioned process, in which recombination events are modelled independently given the clonal genealogy. Inference in the ClonalOrigin model is performed via a reversible-jump MCMC (rjMCMC) algorithm, which attempts to jointly explore: the recombination rate, the number of recombination events, the departure and arrival points on the clonal genealogy for each recombination event, and the sites delimiting the start and end of each recombination event on the genome. However, as known by computational statisticians, the rjMCMC algorithm usually performs poorly due to the difficulty of proposing “good” trans- dimensional moves. Recent developments in Bayesian computation methodology provide ways of improving existing methods and code, but are not well-known outside the statistics community. We present a couple of ideas based on sequential Monte Carlo (SMC) methodology that can lead to faster inference when using the ClonalOrigin model. (This is joint work with Felipe Medina Aguayo and Xavier Didelot.)

    • Liangliang Wang: Sequential Monte Carlo methods for Bayesian phylogenetics

      Abstract: Phylogenetic trees, playing a central role in biology, model evolutionary histories of taxa that range from genes to genomes. The goal of Bayesian phylogenetics is to approximate a posterior distribution of phylogenetic trees based on biological data. Standard Bayesian estimation of phylogenetic trees can handle rich evolutionary models but requires expensive Markov chain Monte Carlo (MCMC) simulations. Our previous work has shown that sequential Monte Carlo (SMC) methods can serve as a good alternative to MCMC in posterior inference over phylogenetic trees. In this talk, I will present our recent work on SMC methods for Bayesian Phylogenetics. We illustrate our methods using simulation studies and real data analysis.

    • Anthony Lee: Latent variable models: statistical and computational efficiency for simple likelihood approximations

      Abstract: A popular statistical modelling technique is to model data as a partial observation of a random process. This allows, in principle, one to fit sophisticated domain-specific models with easily interpretable parameters. However, the likelihood function in such models is typically intractable, and so likelihood-based inference techniques must deal with this intractability in some way. I will briefly talk about two likelihood-based methodologies, pseudo-marginal Markov chain Monte Carlo and simulated maximum likelihood, and discuss statistical and computational scalability in some example settings. The results are also relevant to the use of sequential Monte Carlo algorithms in high-dimensional general state-space hidden Markov models.

  • Advances in MCMC for high dimensional and functional spaces: Galin Jones (U of Minnesota), Vivekananda Roy (Iowa State U), Radu Herbei (The Ohio State U)
    • Galin Jones: TBA

      Abstract: TBA

    • Vivekananda Roy: TBA

      Abstract: TBA

    • Radu Herbei: TBA

      Abstract: TBA

  • Recent advances in Gaussian process computations and theory: Yun Yang (U of Illinois), Joseph Futoma (Harvard U), Michael Zhang (Princeton U).
    • Yun Yang: Frequentist coverage and sup-norm convergence rate in Gaussian process regression

      Abstract: GP regression is a powerful interpolation technique due to its flexibility in capturing non-linearity. In this talk, we provide a general framework for understanding the frequentist coverage of point-wise and simultaneous Bayesian credible sets in random design GP regression. Identifying both the mean and covariance function of the posterior distribution of the Gaussian process as regularized M-estimators, we show that the sampling distribution of the posterior mean function and the centered posterior distribution can be respectively approximated by two population level GPs. By developing a comparison inequality between two GPs, we provide exact characterization of frequentist coverage probabilities of Bayesian pointwise credible intervals and simultaneous credible bands of the regression function. Our results show that inference based on GP regression tends to be conservative; when the prior is under-smoothed, the resulting credible intervals and bands have minimax-optimal sizes, with their frequentist coverage converging to a non-degenerate value between their nominal level and one. As a byproduct of our theory, we show that GP regression also yields minimax-optimal posterior contraction rate relative to the supremum norm, which provides positive evidence to the long-standing problem on optimal supremum norm contraction rate in GP regression.

    • Joseph Futoma: Learning to Detect Sepsis with a Multi-output Gaussian Process RNN Classifier (in the Real World!)

      Abstract: Sepsis is a poorly understood and potentially life-threatening complication that can occur as a result of infection. Early detection and treatment improve patient outcomes, and as such it poses an important challenge in medicine. In this work, we develop a flexible classifier that leverages streaming lab results, vitals, and medications to predict sepsis before it occurs. We model patient clinical time series with multi-output Gaussian processes, maintaining uncertainty about the physiological state of a patient while also imputing missing values. Latent function values from the Gaussian process are then fed into a deep recurrent neural network to classify patient encounters as septic or not, and the overall model is trained end-to-end using back-propagation. We train and validate our model on a large retrospective dataset of 18 months of heterogeneous inpatient stays from the Duke University Health System, and develop a new “real-time” validation scheme for simulating the performance of our model as it will actually be used. We conclude by showing how this model is saving lives as a part of SepsisWatch, an application currently being used at Duke Hospital to screen, monitor, and coordinate treatment of septic patients.

    • Michael Zhang: Embarrassingly parallel inference for Gaussian processes

      Abstract: Gaussian process-based models typically involves an $O(N^3)$ computational bottleneck due to inverting the covariance matrix. Popular methods for overcoming this matrix inversion problem cannot adequately model all types of latent functions and are often not parallelizable. However, judicious choice of model structure can ameliorate this problem. A mixture-of-experts model that uses a mixture of $K$ Gaussian processes offers modeling flexibility and opportunities for scalable inference. Our embarrassingly parallel algorithm combines low-dimensional matrix inversions with importance sampling to yield a flexible, scalable mixture-of-experts model that offers comparable performance to Gaussian process regression at a much lower computational cost.

  • Posterior inference with misspecified models: Judith Rousseau (U of Oxford), Ryan Martin (North Carolina State U), Jonathan Huggins (Harvard U)
    • Judith Rousseau: TBA

      Abstract: TBA

    • Ryan Martin: TBA

      Abstract: TBA

    • Jonathan Huggins: TBA

      Abstract: TBA

  • Convergence of MCMC in theory and in practice: Christina Knudson (U of St. Thomas, MN), Rui Jin (U of Iowa), Xin Wang (Miami U, OH)
    • Christina Knudson: Revisiting the Gelman-Rubin Diagnostic

      Abstract: Gelman and Rubin's (1992) convergence diagnostic is one of the most popular methods for terminating a Markov chain Monte Carlo (MCMC) sampler. Since the seminal paper, researchers have developed sophisticated methods of variance estimation for Monte Carlo averages. We show that this class of estimators find immediate use in the Gelman-Rubin statistic, a connection not established in the literature before. We incorporate these estimators to upgrade both the univariate and multivariate Gelman-Rubin statistics, leading to increased stability in MCMC termination time. An immediate advantage is that our new Gelman-Rubin statistic can be calculated for a single chain. In addition, we establish a relationship between the Gelman-Rubin statistic and effective sample size. Leveraging this relationship, we develop a principled cutoff criterion for the Gelman-Rubin statistic. Finally, we demonstrate the utility of our improved diagnostic via an example.

    • Rui Jin: Fast MCMC for high dimensional Bayesian regression models with shrinkage priors

      Abstract: In the past decade, many Bayesian shrinkage models have been developed for linear regression problems where the number of covariates, p, is large. Computing the intractable posterior are often done with three-block Gibbs samplers (3BG), based on representing the shrinkage priors as scale mixtures of Normal distributions. An alternative computing tool is a state of the art Hamiltonian Monte Carlo (HMC) method, which can be easily implemented in the Stan software. However, we found both existing methods to be inefficient and often impractical for large p problems. Following the general idea of Rajaratnam et al. (2018), we propose two-block Gibbs samplers (2BG) for three commonly used shrinkage models, namely, the Bayesian group lasso, the Bayesian sparse group lasso and the Bayesian fused lasso models. We demonstrate with simulated and real data examples that the Markov chains underlying 2BG's converge much faster than that of 3BG's, and no worse than that of HMC. At the same time, the computing costs of 2BG's per iteration are as low as that of 3BG's, and can be several orders of magnitude lower than that of HMC. As a result, the newly proposed 2BG is the only practical computing solution to do Bayesian shrinkage analysis for datasets with large p. Further, we provide theoretical justifications for the superior performance of 2BG's. First, we establish geometric ergodicity (GE) of Markov chains associated with the 2BG for each of the three Bayesian shrinkage models, and derive quantitative upper bounds for their geometric convergence rates. Secondly, we show that the Markov operators corresponding to the 2BG of the Bayesian group lasso and the Bayesian sparse group lasso are trace class, respectively, whereas that of the corresponding 3BG are not even Hilbert-Schmidt.

    • Xin Wang: Geometric ergodicity of Polya-Gamma Gibbs sampler for Bayesian logistic regression with a flat prior

      Abstract: The logistic regression model is the most popular model for analyzing binary data. In the absence of any prior information, an improper flat prior is often used for the regression coefficients in Bayesian logistic regression models. The resulting intractable posterior density can be explored by running Polson, Scott and Windle’s (2013) data augmentation (DA) algorithm. In this paper, we establish that the Markov chain underlying Polson, Scott and Windle’s (2013) DA algorithm is geometrically ergodic. Proving this theoretical result is practically important as it ensures the existence of central limit theorems (CLTs) for sample averages under a finite second moment condition. The CLT in turn allows users of the DA algorithm to calculate standard errors for posterior estimates.

  • Robust Markov chain Monte Carlo methods: Kengo Kamatani (Osaka U), Krzysztof Łatuszynski (Warwick U), Björn Sprungk (Göttingen U)
    • Kengo Kamatani: Robust Markov chain Monte Carlo methodologies with respect to tail properties

      Abstract: In this talk, we will discuss Markov chain Monte Carlo (MCMC) methods with heavy-tailed invariant probability distributions. When the invariant distribution is heavy-tailed the algorithm has difficulty reaching the tail area. We study the ergodic properties of some MCMC methods with position dependent proposal kernels and apply them to heavy-tailed target distributions.

    • Krzysztof Łatuszynski: A framework for adaptive MCMC targeting multimodal distributions

      Abstract: We propose a new Monte Carlo method for sampling from multimodal distributions. The idea of this technique is based on splitting the task into two: finding the modes of a target distribution and sampling, given the knowledge of the locations of the modes. The sampling algorithm relies on steps of two types: local ones, preserving the mode; and jumps to regions associated with different modes. Besides, the method learns the optimal parameters of the algorithm while it runs, without requiring user intervention. Our technique should be considered as a flexible framework, in which the design of moves can follow various strategies known from the broad MCMC literature. In order to control the jumps, we introduce an auxiliary variable representing each mode and we define a new target distribution on an augmented state space. As the adaptive algorithm runs and updates its parameters, the target distribution also keeps being modified. This motivates a new class of algorithms, Auxiliary Variable Adaptive MCMC. We provide general ergodic results for the whole class before specialising to the case of our algorithm. The performance of the algorithm is illustrated with several multimodal examples. (This is joint work with Chris Holmes and Emilia Pompe.)

    • Björn Sprungk: Noise level-robust Metropolis-Hastings algorithms for Bayesian inference with concentrated posteriors

      Abstract: We consider Metropolis-Hastings algorithms for Markov chain Monte Carlo integration w.r.t. a concentrated posterior measure which results from Bayesian inference with a small additive observational noise. Proposal kernels based only on prior information show a deteriorating efficiency for a decaying noise. We propose to use informed proposal kernels, i.e., random walk proposals with a covariance close to the posterior covariance. Here, we use the a-priori computable covariance of the Laplace approximation of the posterior. Besides some numerical evidence we prove that the resulting informed Metropolis-Hastings shows a non-degenerating mean acceptance rate and lag-one autocorrelation as the noise decays. Thus, it performs robustly w.r.t. a small noise-level in the Bayesian inference problem. The theoretical results are based on the recently established convergence of the Laplace approximation to the posterior measure in total variation norm.

  • Approximate Markov chain Monte Carlo methods: Bamdad Hosseini California Institute of Technology, James Johndrow (Stanford U), Daniel Rudolf (Göttingen U)
    • Bamdad Hosseini: Perturbation theory for a function space MCMC algorithm with non-Gaussian priors

      Abstract: In recent years a number of function space MCMC algorithms have been introduced in the literature. The goal here is to design an algorithm that is well-defined on an infinite-dimensional Banach space with the hope that it will be discretization invariant and overcome some issues that are encountered by standard MCMC algorithms in high-dimensions. However, most of the focus in the literature has been on algorithms that rely on the assumption that the prior measure is a Gaussian or at least absolutely continuous with a Gaussian measure. In this talk we introduce a new class of prior-aware Metropolis-Hastings algorithms for non-Gaussian priors and discuss their convergence and perturbation properties such as dimension-independent spectral gaps and various types of approximations beyond standard approximation by discretization or projections.

    • James Johndrow: Metropolizing approximate Gibbs samplers

      Abstract: There has been much recent work on “approximate” MCMC algorithms, such as Metropolis-Hastings algorithms that rely on minibatches of data, resulting in bias in the invariant measure. Less studied are the various ways in which approximate Gibbs samplers can be designed. We describe a general strategy for using approximate Gibbs samplers as Metropolis-Hastings proposals. Because it is typically less costly to compute the unnormalized posterior density than to take one step of exact Gibbs, and because the Hastings ratio in these algorithms requires only computation of the approximating kernel at pairs of points, one can often achieve reductions in computational complexity per step with no bias in the invariant measure by using approximate Gibbs as a Metropolis-Hastings proposal. We demonstrate the approach with an application to high-dimensional regression.

    • Daniel Rudolf: Time-inhomogeneous approximate Markov chain Monte Carlo

      Abstract: We discuss the approximation of a time-homogeneous Markov chain by a time-inhomogeneous one. An upper bound of the expected absolute difference of the stationary mean, w.r.t. the Markov chain of interest, and the ergodic average based on the approximating Markov chain will be presented. In addition to that we provide explicit estimates of the Wasserstein distance of the difference of the distributions of the Markov chains after n-steps.

  • Sampling Techniques for High-Dimensional Bayesian Inverse Problems: Qiang Liu (U of Texas), Tan Bui-Thanh (U of Texas), Alex Thiery (National U of Singapore)
    • Qiang Liu: Stein Variational Gradient Descent: Algorithm, Theory, Applications

      Abstract: Approximate probabilistic inference is a key computational task in modern machine learning, which allows us to reason with complex, structured, hierarchical (deep) probabilistic models to extract information and quantify uncertainty. Traditionally, approximate inference is often performed by either Markov chain Monte Carlo (MCMC) and variational inference (VI), both of which, however, have their own critical weaknesses: MCMC is accurate and asymptotically consistent but suffers from slow convergence; VI is typically faster by formulating inference problem into gradient-based optimization, but introduces deterministic errors and lacks theoretical guarantees. Stein variational gradient descent (SVGD) is a new tool for approximate inference that combines the accuracy and flexibility of MCMC and practical speed of VI and gradient-based optimization. The key idea of SVGD is to directly optimize a non-parametric particle-based representation to fit intractable distributions with fast deterministic gradient-based updates, which is made possible by integrating and generalizing key mathematical tools from Stein's method, optimal transport, and interacting particle systems. SVGD has been found a powerful tool in various challenging settings, including Bayesian deep learning and deep generative models, reinforcement learning, and meta learning. This talk will introduce the basic ideas and theories of SVGD, and cover some examples of application.

    • Tan Bui-Thanh: A data-consistent approach to statistical inverse problems

      Abstract: Given a hierarchy of reduced-order models to solve the inverse problems for quantities of interest, each model with varying levels of fidelity and computational cost, a machine learning framework is proposed to improve the models by learning the errors between each successive levels. Each reduced-order model is a statistical model generating rapid and reasonably accurate solutions to new parameters, and are typically formed using expensive forward solves to find the reduced subspace. These approximate reduced-order models speed up computational time but they introduce additional uncertainty to the solution. By statistically modeling errors of reduced order models and using training data involving forward solves of the reduced order models and the higher fidelity model, we train a deep neural network to learn the error between successive levels of the hierarchy of reduced order models thereby improving their error bounds. The training of the deep neural network occurs during the offline phase and the error bounds can be improved online as new training data is observed. Once the deep-learning-enhanced reduced model is constructed, it is amenable to any sampling method as its cos is a fraction of the cost of the original model.

    • Alex Thiery: Exploiting geometry for walking larger steps in Bayesian inverse problems

      Abstract: Abstract: Consider the observation $y = F(x) + \xi$ of a quantity of interest $x$ -- the random variable $\xi \sim \mathcal{N}(0, \sigma^2 I)$ is a vector of additive noise in the observation. In Bayesian inverse problems, the vector $x$ typically represents the high-dimensional discretization of a continuous and unobserved field while the evaluations of the forward operator $F(\cdot)$ involve solving a system of partial differential equations. In the low-noise regime, i.e. $\sigma \to 0$, the posterior distributions concentrates in the neighbourhood of a nonlinear manifold. As a result, the efficiency of standard MCMC algorithms deteriorates due to the need to take increasingly smaller steps. In this work, we present a constrained HMC algorithm that is robust to small $\sigma$ values, i.e. low noise. Taking the observations generated by the model to be constraints on the prior, we define a manifold on which the constrained HMC algorithm generate samples. By exploiting the geometry of the manifold, our algorithm is able to take larger step sizes than more standard MCMC methods, resulting in a more efficient sampler. If time permits, we will describe how similar ideas can be leveraged within other non-reversible samplers.

Short Courses/Tutorials/Practice Labs

    The conference will begin with some combination of Short Courses, Tutorials, and Practice Labs on Tuesday afternoon (January 7, 2020). Proposals are welcome - send a title and brief description to David Rossell.




Code of Conduct

ISBA takes very seriously any form of misconduct, including but not limited to sexual harassment and bullying. All meeting participants are expected to adhere strictly to the official ISBA Code of Conduct. Following the safeISBA motto, we want ISBA meetings to be safe and to be fun. We encourage participants to report any concerns or perceived misconduct to the meeting organizers, Jim Hobert and Christian Robert. Further suggestions can be sent to safeisba@bayesian.org.