Bayes Comp 2020

	Early (through Aug 14)	Regular (Aug 15 - Oct 14)	Late (starting Oct 15)
Student Member of ISBA	125	150	175
Student Non-member of ISBA	165	190	215
Regular Member of ISBA	250	300	350
Regular Non-Member of ISBA	350	400	450

Here is the program

Plenary Speakers

David Blei (Columbia University): Scaling and generalizing approximate Bayesian inference.

Paul Fearnhead (U of Lancaster): Continuous-time MCMC

Emily Fox (U of Washington): Computational approaches for large-scale time series analysis.

Invited Sessions

Theory & practice of HMC (and its variants) for Bayesian hierarchical models: Tamara Broderick (MIT), George Deligiannidis (U of Oxford), Aaron Smith (U of Ottawa).

Tamara Broderick: The kernel interaction trick: Fast Bayesian discovery of multi-way interactions in high dimensions

Abstract: Discovering interaction effects on a response of interest is a fundamental problem faced in biology, medicine, economics, and many other scientific disciplines. In theory, Bayesian methods for discovering pairwise interactions enjoy many benefits such as coherent uncertainty quantification, the ability to incorporate background knowledge, and desirable shrinkage properties. In practice, however, Bayesian methods are often computationally intractable for even moderate-dimensional problems. Our key insight is that many hierarchical models of practical interest admit a particular Gaussian process (GP) representation; the GP allows us to capture the posterior with a vector of $O(p)$ kernel hyper-parameters rather than $O(p^2)$ interactions and main effects. With the implicit representation, we can run Markov chain Monte Carlo (MCMC) over model hyper-parameters in time and memory linear in p per iteration. We focus on sparsity-inducing models and show on datasets with a variety of covariate behaviors that our method: (1) reduces runtime by orders of magnitude over naive applications of MCMC, (2) provides lower Type I and Type II error relative to state-of-the-art LASSO-based approaches, and (3) offers improved computational scaling in high dimensions relative to existing Bayesian and LASSO-based approaches.

George Deligiannidis: Randomized Hamiltonian Monte Carlo as scaling limit of the bouncy particle sampler and dimension-free convergence rates Carlo

Abstract: The Bouncy Particle Sampler is a Markov chain Monte Carlo method based on a nonreversible piecewise deterministic Markov process. In this scheme, a particle explores the state space of interest by evolving according to a linear dynamics which is altered by bouncing on the hyperplane tangent to the gradient of the negative log-target density at the arrival times of an inhomogeneous Poisson Process (PP) and by randomly perturbing its velocity at the arrival times of an homogeneous PP. Under regularity conditions, we show here that the process corresponding to the first component of the particle and its corresponding velocity converges weakly towards a Randomized Hamiltonian Monte Carlo (RHMC) process as the dimension of the ambient space goes to infinity. RHMC is another piecewise deterministic non-reversible Markov process where a Hamiltonian dynamics is altered at the arrival times of a homogeneous PP by randomly perturbing the momentum component. We then establish dimension-free convergence rates for RHMC for strongly log-concave targets with bounded Hessians using coupling ideas and hypocoercivity techniques.

Aaron Smith: Free lunches and subsampling Monte Carlo

Abstract: It is widely known that the performance of MCMC algorithms can degrade quite quickly when targeting computationally expensive posterior distributions, including the posteriors associated with any large dataset. This has motivated the search for MCMC variants that scale well for large datasets. One general approach, taken by several research groups, has been to look at only a subsample of the data at every step. In this talk, we focus on a simple "no-free-lunch" results which provide some basic limits on the performance of many such algorithms. We apply these generic results to realistic statistical problems and proposed algorithms, and also discuss some special examples that can avoid our generic results and provide a free (or at least cheap) lunch. (Joint with Patrick Conrad, Andrew Davis, James Johndrow, Youssef Marzouk, Natesh Pillai, and Pengfei Wang.)

Scalable methods for high-dimensional problems: Akihiko Nishimura (UCLA), Anirban Bhattacharya (Texas A&M), Lassi Roininen (LUT U).

Akihiko Nishimura: Scalable Bayesian sparse generalized linear models and survival analysis via curvature-adaptive Hamiltonian Monte Carlo for high-dimensional log-concave distributions

Abstract: Bayesian sparse regression based on shrinkage priors possess many desirable theoretical properties and yield posterior distributions whose conditionals mostly admit straightforward Gibbs updates. Sampling high-dimensional regression coefficients from its conditional distribution, however, presents a major scalability issue in posterior computation. The conditional distribution generally does not belong to a parametric family and the existing sampling approaches are hopelessly inefficient in high-dimensional settings. Inspired by recent advances in understanding the performance of Hamiltonian Monte Carlo (HMC) on log-concave target distributions, we develop *curvature-adaptive HMC* for scalable posterior inference under sparse regression models with log-concave likelihoods. As is well-known, HMC's performance critically depends on the integrator stepsize and mass matrix. These tuning parameters are typically adjusted over many HMC iterations by collecting statistics on the target distribution --- an impractical approach when employing HMC within a Gibbs sampler since the conditional distribution changes as the other parameters are updated. Instead, we achieve on-the-fly calibration of the key HMC tuning parameters through 1) the recently developed theory of *prior-preconditioning* for sparse regression and 2) a rapid estimation of the curvature of a given log-concave target via *iterative methods* from numerical linear algebra. We demonstrate the scalability of our method on a clinically relevant large-scale observational study involving n >= 80,000 patients and p >= 10,000 predictors, designed to assess the relative efficacy of two alternative hypertension treatments.

Anirban Bhattacharya: Approximate MCMC for high-dimensional estimation

Abstract: We discuss a number of applications of approximate MCMC to complex high-dimensional structured estimation problems. A unified theoretical treatment is provided to understand the impact of introducing approximations to the exact MCMC transition kernel.

Lassi Roininen: Posterior inference for sparse hierarchical non-stationary models

Abstract: Gaussian processes are valuable tools for non-parametric modelling, where typically an assumption of stationarity is employed. While removing this assumption can improve prediction, fitting such models is challenging. In this work, hierarchical models are constructed based on Gaussian Markov random fields with stochastic spatially varying parameters. Importantly, this allows for non-stationarity while also addressing the computational burden through a sparse representation of the precision matrix. The prior field is chosen to be Matérn, and two hyperpriors, for the spatially varying parameters, are considered. One hyperprior is Ornstein-Uhlenbeck, formulated through an autoregressive process. The other corresponds to the widely used squared exponential. In this setting, efficient Markov chain Monte Carlo (MCMC) sampling is challenging due to the strong coupling a posteriori of the parameters and hyperparameters. We develop and compare three MCMC schemes, which are adaptive and therefore free of parameter tuning. Furthermore, a novel extension to higher-dimensional settings is proposed through an additive structure that retains the flexibility and scalability of the model, while also inheriting interpretability from the additive approach. A thorough assessment of the ability of the methods to efficiently explore the posterior distribution and to account for non-stationarity is presented, in both simulated experiments and a real-world computer emulation problem. https://arxiv.org/abs/1804.01431

MCMC and scalable Bayesian computations: Philippe Gagnon (U of Oxford), Florian Maire (U de Montréal), Giacomo Zanella (Bocconi U).

Philippe Gagnon: Nonreversible jump algorithms for nested models

Abstract: It is now well known that nonreversible Markov chain Monte Carlo methods often outperform their reversible counterparts. Lifting the state space (Chen et al. (1999)) has proved to be a successful technique for constructing such samplers relying on nonreversible Markov chains. The idea is to see the random variables that we wish to generate as position variables to which we associate velocity (or direction) variables, doubling the size of the state space. At each iteration of such samplers, the positions evolve deterministically as a function of the directions, and this is followed by a possible update of the latter. This direction assisted scheme may induce persistent movements that allow to traverse the state space more quickly, compared with the traditional methods producing chains with diffusive patterns. This explains the gain in efficiency. Directions playing a central role, the technique can only be employed to explore state spaces for which this concept is well defined. In this paper, we introduce samplers that we call nonreversible jump algorithms that can be applied to simultaneously achieve model selection and parameter estimation, in situations where the family of models considered forms a sequence of nested models; there thus exists a natural order among the models, and therefore, directions. These samplers are constructed by modifying reversible jump algorithms after having lifted the part of the state space associated with the model indicator. We demonstrate their correctness and show that they compare favourably to their reversible counterpart using both theoretical arguments as well as numerical experiments. We address implementation challenges, facilitating application by users.

Florian Maire: Can we improve convergence of MCMC methods by aggregating Markov kernels in a locally informed way?

Abstract: For a given probability distribution $\pi$, there is virtually an infinite number of Markov kernels capable of generating useful Markov chains to infer $\pi$. Hybrid methods refer to algorithms where several Markov kernels are mixed with a fixed probability distribution $\omega$. In this talk, we introduce a dependence between $\omega$ and the current state of the Markov chain, a strategy that we refer to as Locally Informed Hybrid Markov chain, since $\omega$ can be specified so as to reflect the local topology of the state-space. The analysis of this intuitive construction reveals a number of surprises that question some of the usual Markov chain comparison tools, from a statistical learning viewpoint. These include tools based on the spectral analysis of the underlying Markov operator as well as Peskun ordering that give typically pessimistic results for metastable Markov chains, a framework which Locally Informed Hybrid Markov chains fall into. Finally, situations where the statistical efficiency of estimators based on Locally Informed Hybrid Markov chains is superior to that of traditional Hybrid algorithms are discussed.

Giacomo Zanella: On the robustness of gradient-based sampling algorithms

Abstract: We analyze the tension between robustness and efficiency for Markov chain Monte Carlo (MCMC) sampling algorithms. In particular, we focus on the robustness of MCMC algorithms with respect to heterogeneity in the target, an issue of great practical relevance but still understudied theoretically. We show that the spectral gap of the Markov chains induced by classical gradient-based MCMC schemes (e.g. Langevin and Hamiltonian Monte Carlo) decays exponentially fast in the degree of mismatch between the scales of the proposal and target, while for the random walk Metropolis (RWM) the decay is linear. This result provides theoretical support to the notion that gradient-based MCMC schemes are less robust to heterogeneity and more sensitive to tuning. Motivated by these considerations, we propose a novel and simple-to-implement gradient-based MCMC algorithm, inspired by the classical Barker accept-reject rule, with improved robustness properties. Extensive theoretical results, dealing with robustness to heterogeneity, geometric ergodicity and scaling with dimensionality, show that the novel scheme combines the robustness of RWM with the efficiency of classical gradient-based schemes. The theoretical results are illustrated with simulation studies. (Joint work with Samuel Livingstone.)

Scalable methods for posterior inference from big data: Subharup Guha (U of Florida), Zhenyu Zhang (UCLA), David Dahl (Brigham Young U).

Subharup Guha: Fast MCMC techniques for fitting Bayesian mixture models to massive multiple-platform cancer data

Abstract: Recent advances in array-based and next-generation sequencing technologies have revolutionized biomedical research, especially in cancer. Bayesian mixture models, such as finite mixtures, hidden Markov models, and Dirichlet processes, offer elegant frameworks for inference, especially because they are flexible, avoid making unrealistic assumptions about the data features and the nature of the interactions, and permit nonlinear dependencies. However, existing inference procedures for these models do not scale to multiple-platform Big Data and often stretch computational resources past their limits. An investigation of the theoretical properties of these models offers insight into asymptotics that form the basis of broadly applicable, cost-effective MCMC strategies for large datasets. These MCMC techniques have the advantage of providing inferences from the posterior of interest, rather than an approximation, and are applicable to different Bayesian mixture models. Furthermore, they can be applied to develop massively parallel MCMC algorithms for these data. The versatility and impressive gains of the methodology are demonstrated by simulation studies and by a semiparametric integrative analysis that detects shared biological mechanisms in heterogeneous multi-platform cancer datasets. (Joint with Dongyan Yan and Veera Baladandayuthapani.)

Zhenyu Zhang: Bayesian inference for large-scale phylogenetic multivariate probit models

Abstract: Inferring correlation among biological features is an important yet challenging problem in evolutionary biology. In addition to adjusting for correlations induced from an uncertain evolutionary history, we also have to deal with features measured in different scales: continuous and binary. We jointly model the two feature types by introducing latent continuous parameters for binary features, giving rise to a phylogenetic multivariate probit model. Posterior computation under this model remains problematic with increasing sample size, requiring repeatedly sampling from a high-dimensional truncated Gaussian distribution. Best current approaches scale quadratically in sample size and suffer from slow-mixing. We develop a new computation approach that exploits 1) the state-of-the-art bouncy particle sampler based on piece-wise deterministic Markov process and 2) a novel dynamic programming approach that reduces the cost of likelihood and gradient evaluations to linear in sample size. In an application, we successfully handle a 14,980-dimensional truncated Gaussian, making it possible to estimate correlations among 28 HIV virulence and immunological epitope features across 535 viruses. The proposed approach is of independent interest, being applicable to a broader class of covariance structures beyond comparative biology. (Joint with Akihiko Nishimura, Philippe Lemey, and Marc A. Suchard.)

David Dahl: Summarizing distributions of latent structure

Abstract: In a typical Bayesian analysis, considerable effort is placed on "fitting the model" (e.g., obtaining samples from the posterior distribution) but this is only half of the inference problem. Meaningful inference usually requires summarizing the posterior distribution of the parameters of interest. Posterior summaries can be especially important in communicating the results and conclusions from a Bayesian analysis to a diverse audience. If the parameters of interest live in R^n, common posterior summaries are means, medians, and modes. Summarizing posterior distributions of parameters with complicated structure is a more difficult problem. For example, the "average" network in the posterior distribution on a network is not easily defined. This paper reviews methods for summarizing distributions of latent structure and then proposes a novel search algorithm for posterior summaries. We apply our method to distributions on variable selection indicators, partitions, feature allocations, and networks. We illustrate our approach in a variety of models for both simulated and real datasets. (Joint with Peter Müller.)

Efficient computing strategies for high-dimensional problems: Gareth Roberts (U of Warwick), Veronika Rockova (U of Chicago), Gregor Kastner (Vienna U of Economics and Business).

Gareth Roberts: Bayesian fusion

Abstract: Suppose we can readily access samples from $\pi_i(x)$, $1\le i\le n$, but we wish to obtain samples from $\pi (x) = \prod_ {i=1}^n \pi_i (x) $. The so-called Bayesian Fusion problem comes up within various areas of modern Bayesian analysis, for example in the context of big data or privacy constraints, as well as more traditional areas such as meta-analysis. Many approximate solutions to this problem have been proposed. However this talk will present an exact solution based on rejection sampling in an extended state space, where the accept/reject decision is carried out by simulating the skeleton of a suitably constructed auxiliary collection of Brownian bridges. (This is joint work with Hongsheng Dai and Murray Pollock.)

Veronika Rockova: Variable Selection with ABC Bayesian Forests

Abstract: Few problems in statistics are as perplexing as variable selection in the presence of very many redundant covariates. The variable selection problem is most familiar in parametric environments such as the linear model or additive variants thereof. In this work, we abandon the linear model framework, which can be quite detrimental when the covariates impact the outcome in a non-linear way, and turn to tree-based methods for variable selection. Such variable screening is traditionally done by pruning down large trees or by ranking variables based on some importance measure. Despite heavily used in practice, these ad-hoc selection rules are not yet well understood from a theoretical point of view. In this work, we devise a Bayesian tree-based probabilistic method and show that it is consistent for variable selection when the regression surface is a smooth mix of p>n covariates. These results are the first model selection consistency results for Bayesian forest priors. Probabilistic assessment of variable importance is made feasible by a spike-and-slab wrapper around sum-of-trees priors. Sampling from posterior distributions over trees is inherently very difficult. As an alternative to MCMC, we propose ABC Bayesian Forests, a new ABC sampling method based on data-splitting that achieves higher ABC acceptance rate. We show that the method is robust and successful at finding variables with high marginal inclusion probabilities. Our ABC algorithm provides a new avenue towards approximating the median probability model in non-parametric setups where the marginal likelihood is intractable. (Joint with Yi Liu and Yuexi Wang.)

Gregor Kastner: Efficient Bayesian computing in many dimensions - applications in economics and finance

Abstract: Statistical inference for dynamic models in high dimensions often comes along with a huge amount of parameters that need to be estimated. Thus, to handle the curse of dimensionality, suitable regularization methods are of prime importance, and efficient computational tools are required to make practical estimation feasible. In this talk, we exemplify how these two principles can be implemented for models of importance in macroeconomics and finance. First, we discuss a Bayesian vector autoregressive (VAR) model with time-varying contemporaneous correlations that is capable of handling vast dimensional information sets. Second, we propose a straightforward algorithm to carry out inference in large dynamic regression settings with mixture innovation components for each coefficient in the system.

MCMC methods in high dimension, theory and applications: Christophe Andrieu (U of Bristol), Gabriel Stoltz (Ecole des Ponts ParisTech), Umut Simsekli (Télécom ParisTech).

Christophe Andrieu: All about the Metropolis-Hastings-Green update

Abstract: TBA

Gabriel Stoltz: Removing the mini-batching error in large scale Bayesian sampling

Abstract: The cost of performing one step of a sampling method such as Langevin dynamics scales linearly with the number of data points in Bayesian inference. To alleviate this issue, mini-batching was put forward by Welling and Teh. However, mini-batching leads to some bias on the a posteriori distribution of parameters. Adaptive Langevin dynamics were devised to remove this bias. The idea is to consider an inertial Langevin dynamics where the friction is a dynamical variable, updated according to some Nose-Hoover feedback (inspired by techniques from molecular dynamics). We show here using techniques from hypocoercivity that the law of Adaptive Langevin dynamics converges exponentially fast to equilibrium, with a rate which can be quantified in terms of the key parameters of the dynamics (mass of the extra variable and magnitude of the fluctuation in the Langevin dynamics). This allows us in particular to obtain a Central Limit Theorem on time averages along realizations of the dynamics. Currently, this method is however limited to unknown diffusion matrices which do not depend on the parameters (additive noise). I will mention extensions to the case of multiplicative noise.

Umut Simsekli: Nonparametric generative modeling via optimal transport and diffusions with provable guarantees

Abstract: By building upon the recent theory that established the connection between implicit generative modeling (IGM) and optimal transport, in this study, we propose a novel parameter-free algorithm for learning the underlying distributions of complicated datasets and sampling from them. The proposed algorithm is based on a functional optimization problem, which aims at finding a measure that is close to the data distribution as much as possible and also expressive enough for generative modeling purposes. We formulate the problem as a gradient flow in the space of probability measures. The connections between gradient flows and stochastic differential equations let us develop a computationally efficient algorithm for solving the optimization problem. We provide formal theoretical analysis where we prove finite-time error guarantees for the proposed algorithm. Our experimental results support our theory and show that our algorithm is able to successfully capture the structure of different types of data distributions.

Computational advancements in entity resolution: Brenda Betancourt (U of Florida), Andee Kaplan (Duke U), Rebecca Steorts (Duke U).

Brenda Betancourt: Generalized flexible microclustering models for entity resolution

Abstract: Classical clustering tasks accomplished with Bayesian random partition models seek to divide a given population or data set in a relatively small number of clusters whose size grows with the number of data points. For other clustering applications, such as entity resolution, this assumption is inappropriate. Entity resolution (record linkage or de-duplication) is the process of removing duplicate records from noisy databases often in the absence of a unique identifier. One natural approach to entity resolution is as a clustering problem, where each entity is implicitly associated with one or more records and the inference goal is to recover the latent entities (clusters) that correspond to the observed records (data points). In most entity resolution tasks, the clusters are very small and remain small as the number of records increases. This framework requires models that yield clusters whose sizes grow sublinearly with the total number of data points. We introduce a general class of microclustering models suitable for the 'microclustering' problem, and fully characterize its theoretical properties and asymptotic behavior. We also present a partially-collapsed MCMC sampler that, compared to common sampling schemes found in the literature, achieves a significantly better mixing by overcoming strong dependencies between some of the parameters in the model. To improve scalability, we combine the sampling algorithm with a common record linkage blocking technique that allows for parallel programing. (Joint with Giacomo Zanella and Rebecca Steorts.)

Andee Kaplan: Life after record linkage: Tackling the downstream task with error propagation

Abstract: Record linkage (entity resolution or de-duplication) is the process of merging noisy databases to remove duplicate entities that often lack a unique identifier. Linking data from multiple databases increases both the size and scope of a dataset, enabling post-processing tasks such as linear regression or capture-recapture to be performed. Any inferential or predictive task performed after linkage can be considered as the "downstream task." While recent advances have been made to improve flexibility and accuracy of record linkage, there are limitations in the downstream task due to the passage of errors through this two-step process. In this talk, I present a generalized framework for creating a representative dataset post-record linkage for the downstream task, called prototyping. Given the information about the representative records, I explore two downstream tasks—linear regression and binary classification via logistic regression. In addition, I discuss how error propagation occurs in both of these settings. I provide thorough empirical studies for the proposed methodology, and conclude with a discussion of practical insights into my work. (Joint with Brenda Betancourt and Rebecca Steorts.)

Rebecca Steorts: Scalable end-to-end Bayesian entity resolution

Abstract: Very often information about social entities is scattered across multiple databases. Combining that information into one database can result in enormous benefits for analysis, resulting in richer and more reliable conclusions. In most practical applications, however, analysts cannot simply link records across databases based on unique identifiers, such as social security numbers, either because they are not a part of some databases or are not available due to privacy concerns. In such cases, analysts need to use methods from statistical and computational science known as entity resolution (record linkage or de-duplication) to proceed with analysis. Entity resolution is not only a crucial task for social science and industrial applications, but is a challenging statistical and computational problem itself. One recent development in entity resolution methodology has been the application of Bayesian generative models. These models offer several advantages over conventional methods, namely: (i) they do not require labeled training data; (ii) they treat linkage as a clustering problem which preserves transitivity; (iii) they propagate uncertainty; and (iv) they allow for flexible modeling assumptions. However, due to difficulties in scaling, these models have so far been limited to small data sets of around 1000 records. In this talk, I propose the first scalable Bayesian models for entity resolution. This extension brings together several key ideas, including probabilistic blocking, indexing, and efficient sampling algorithms. The proposed methodology is illustrate on both synthetic and real data. (Joint with Neil Marchant, Benjamin Rubinstein, Andee Kaplan, and Daniel Elazar.)

ABC: Ruth Baker (U of Oxford), David Frazier (Monash U), Umberto Picchini (Chalmers U of Tech & U of Gothenburg).

Ruth Baker: Multifidelity approximate Bayesian computation

Abstract: A vital stage in the mathematical modelling of real-world systems is to calibrate a model's parameters to observed data. Likelihood-free parameter inference methods, such as Approximate Bayesian Computation, build Monte Carlo samples of the uncertain parameter distribution by comparing the data with large numbers of model simulations. However, the computational expense of generating these simulations forms a significant bottleneck in the practical application of such methods. We identify how simulations of cheap, low-fidelity models have been used separately in two complementary ways to reduce the computational expense of building these samples, at the cost of introducing additional variance to the resulting parameter estimates. We explore how these approaches can be unified so that cost and benefit are optimally balanced, and we characterise the optimal choice of how often to simulate from cheap, low-fidelity models in place of expensive, high-fidelity models in Monte Carlo ABC algorithms. The resulting early accept/reject multifidelity ABC algorithm that we propose is shown to give improved performance over existing multifidelity and high-fidelity approaches.

David Frazier: Robust approximate Bayesian inference with synthetic likelihood

Abstract: Bayesian synthetic likelihood (BSL) is now a well-established method for conducting approximate Bayesian inference in complex models where exact Bayesian approaches are either infeasible, or computationally demanding, due to the intractability of likelihood function. Similar to other approximate Bayesian methods, such as the method of approximate Bayesian computation, implicit in the application of BSL is the maintained assumption that the data generating process can generate simulated summary statistics that mimic the behaviour of the observed summary statistics. This notion of model compatibility with the observed summaries is critical for the performance of BSL and its variants. We demonstrate theoretically, and through several examples, that if the assumed data generating process (DGP) differs from the true DGP, model compatibility may no longer be satisfied and BSL can give unreliable inferences. To circumvent the issue of incompatibility between the observed and simulated summary statistics, we propose two robust versions of BSL that can deliver reliable performance regardless of whether or not the observed and simulated summaries are compatible. Simulation results and two empirical examples demonstrate the good performance of this robust approach to BSL, and its superiority over standard BSL when model compatibility is not in evidence.

Umberto Picchini: Variance reduction for fast ABC using resampling

Abstract: Approximate Bayesian computation (ABC) is the state-of-art methodology for likelihood-free Bayesian inference. Its main feature is the ability to bypass the explicit calculation of the likelihood function, by only requiring access to a model simulator to generate many artificial datasets. In the context of pseudo-marginal ABC-MCMC (Bornn, Pillai, Smith and Woodard, 2017), generating $M> 1$ datasets for each MCMC iteration allows to construct a kernel-smoothed ABC likelihood which has lower variance, this resulting beneficial for the mixing of the ABC-MCMC chain, compared to the typical ABC setup which sets $M=1$. However, setting $M>1$ implies a computational bottleneck, and in Bornn, Pillai, Smith and Woodard (2017) it was found that the benefits of using $M>1$ are not worth the increasing computational effort. In Everitt (2017) it was shown that, when the intractable likelihood is replaced by a \textit{synthetic likelihood} (SL, Wood, 2010), it is possible to use $M=1$ and resample many times from this single simulated dataset, to construct computationally fast SL inference that artificially emulates the case $M>1$. Unfortunately, this approach was found to be ineffective within ABC, as the resampling generates inflated ABC posteriors. In this talk we show how to couple \textit{stratified sampling} with the resampling idea of Everitt (2017). We construct an ABC-MCMC algorithm that uses a small number of model simulations ($M=1$ or 2) for each MCMC iteration, while substantially reducing the additional variance in the approximate posterior distribution induced by resampling. We therefore enjoy the computational speedup from resampling approaches, and show that our stratified sampling procedure allows us to use a larger than usual ABC threshold, while still obtaining accurate inference. (Joint with Richard Everitt.)

Continuous-time and non-reversible Monte Carlo methods: Yian Ma (U of California, Berkeley), Manon Michel (U Clermont-Auvergne).

Yian Ma: Bridging MCMC and Optimization

Abstract: Rapid growth in data size and model complexity has boosted questions on how computational tools can scale with the problem and data complexity. Optimization algorithms have had tremendous success for convex problems in this regard. MCMC algorithms for mean estimates, on the other hand, are slower than the optimization algorithms in convex unconstrained scenarios. It has even become folklore that the MCMC algorithms are in general computationally more intractable than optimization algorithms. In this talk, I will examine a class of non-convex objective functions arising from mixture models. For that class of objective functions, I discover that the computational complexity of MCMC algorithms scales linearly with the model dimension, while optimization problems are NP hard. I will then study MCMC algorithms as optimization over the KL-divergence in the space of measures. By incorporating a momentum variable, I will discuss an algorithm which performs accelerated gradient descent over the KL-divergence. Using optimization-like ideas, a suitable Lyapunov function is constructed to prove that an accelerated convergence rate is obtained.

Manon Michel: Accelerations of MCMC methods by non-reversibility and factorization

Abstract: During this talk, I will present the historical development of non-reversible Markov-chain Monte Carlo methods, based on piecewise deterministic Markov processes (PDMP). First developed for multiparticle systems, the goal was to emulate the successes of cluster algorithms for spin systems and was achieved through the replacement of the time reversibility by symmetries of the sampled probability distribution itself. These methods have shown to bring clear accelerations and are now competing with molecular dynamics methods in chemical physics or state-of-the-art sampling schemes, e.g. Hamiltonian Monte Carlo, in statistical inference. I will discuss their successes as well as the remaining open questions. Finally, I will explain how the factorization of the distribution can lead to computational complexity reduction.

Markov chain convergence analysis and Wasserstein distance: Alain Durmus (ENS Paris-Saclay), Jonathan Mattingly (Duke U), Qian Qin (U of Minnesota).

Alain Durmus: TBA

Abstract: TBA

Jonathan Mattingly: TBA

Abstract: TBA

Qian Qin: Geometric convergence bounds for Markov chains in Wasserstein distance based on generalized drift and contraction conditions

Abstract: Quantitative bounds on the convergence rate of a Markov chain with respect to some Wasserstein distance can be derived using a set of drift and contraction conditions. Previous studies focus on the case where the parameters in this type of condition are constant. We propose a method for constructing convergence bounds based on generalized drift and contraction conditions whose parameters may vary across the state space. This can lead to significantly improved bounds. Our result also extends existing bounds in the literature to the case where the Wasserstein distance is unbounded.

Young researchers' contributions to Bayesian computation: Tommaso Rigon (Bocconi U), Michael Jauch (Duke U), Nicholas Tawn (U of Warwick).

Tommaso Rigon: Bayesian inference for finite-dimensional discrete priors

Abstract: Discrete random probability measures are the main ingredient for addressing Bayesian clustering. The investigation in this area has been very lively, with strong emphasis on nonparametric procedures based either on the Dirichlet process or on more flexible generalizations, such as the Pitman-Yor (PY) process or the normalized random measures with independent increments (NRMI). The literature on finite-dimensional discrete priors, beyond the classic Dirichlet-multinomial model, is much more limited. We aim at filling this gap by introducing novel classes of priors closely related to the PY process and NRMIs, which are recovered as limiting case. Prior and posterior distributional properties are extensively studied. Specifically, we identify the induced random partitions and determine explicit expressions of the associated urn schemes and of the posterior distributions. A detailed comparison with the (infinite-dimensional) PY and NRMIs is provided. Finally, we employ our proposal for mixture modeling, and we assess its performance over existing methods in the analysis of a real dataset.

Michael Jauch: Bayesian analysis with orthogonal matrix parameters

Abstract: Statistical models for multivariate data are often parametrized by a set of orthogonal matrices. Bayesian analyses of models with orthogonal matrix parameters present two major challenges: posterior simulation on the constrained parameter space and incorporation of prior information such as sparsity or row dependence. We propose methodology to address both of these challenges. To simulate from posterior distributions defined on a set of orthogonal matrices, we propose polar parameter expansion, a parameter expanded Markov chain Monte Carlo approach suitable for routine and flexible posterior inference in standard simulation software. To incorporate prior information, we introduce prior distributions for orthogonal matrix parameters constructed via the polar decomposition of an unconstrained random matrix. Prior distributions constructed in this way satisfy a number of appealing properties and posterior inference can again be carried out in standard simulation software. We illustrate these techniques by fitting Bayesian models for a protein interaction network and gene expression data.

Nicholas Tawn: The Annealed Leap Point Sampler (ALPS) for multimodal target distributions

Abstract: This talk introduces a novel algorithm, ALPS, that is designed to provide a scalable approach to sampling from multimodal target distributions. The ALPS algorithm concatenates a number of the strengths of the current gold standard approaches for multimodality. It is strongly based around the well known parallel tempering procedure but rather than using “hot state” tempering levels the ALPS algorithm instead appeals to annealing. In annealed temperature levels the modes become even more isolated with the effects of modal skew less pronounced. Indeed the more annealed the temperature the more accurately the local mode is approximated by a Laplace approximation. The idea is to exploit this by utilizing a powerful Gaussian mixture independence sampler at the annealed temperature levels allowing rapid mixing between modes. This mixing information is then filtered back to the target of interest using a parallel tempering-like procedure with carefully designed marginal distributions.

Approximate Bayesian nonparametrics: Peter Müller (U of Texas), Debdeep Pati (Texas A&M), Jeff Miller (Harvard U).

Peter Müller: Consensus Monte Carlo for random subsets using shared anchors

Abstract: We present a consensus Monte Carlo algorithm that scales existing Bayesian nonparametric models for clustering and feature allocation to big data. The algorithm is valid for any prior on random subsets such as partitions and latent feature allocation, under essentially any sampling model. Motivated by three case studies, we focus on clustering induced by a Dirichlet process mixture sampling model, inference under an Indian buffet process prior with a binomial sampling model, and with a categorical sampling model. We assess the proposed algorithm with simulation studies and show results for inference with three datasets: an MNIST image dataset, a dataset of pancreatic cancer mutations, and a large set of electronic health records (EHR).

Debdeep Pati: Convergence of variational Bayes algorithms

Abstract: We develop techniques for analyzing the convergence of variational Bayes algorithms in three classic examples: i) variational lower bound optimization using convex duality in generalized linear models ii) variational boosting and iii) coordinate ascent inference in discrete graphical models. The key idea is to relate the updates with an associated dynamical system and analyze its spectra. In some cases, we provide specific conditions for the algorithm to converge to the solution, exhibit periodicity or become unstable.

Jeff Miller: Flexible perturbation models for robustness to misspecification

Abstract: In many applications, there are natural statistical models with interpretable parameters that provide insight into questions of interest. While useful, these models are almost always wrong in the sense that they only approximate the true data generating process. In some cases, it is important to account for this model error when quantifying uncertainty in the parameters. We propose to model the distribution of the observed data as a perturbation of an idealized model of interest by using a nonparametric mixture model in which the base distribution is the idealized model. This provides robustness to small departures from the idealized model and, further, enables uncertainty quantification regarding the model error itself. Inference can easily be performed using existing methods for the idealized model in combination with standard methods for mixture models. Remarkably, inference can be even more computationally efficient than in the idealized model alone, because similar points are grouped into clusters that are treated as individual points from the idealized model. We demonstrate with simulations and an application to flow cytometry.

Contributed Sessions

Novel mixture-based computational approaches to Bayesian learning: Michele Guindani (U of California, Irvine), Antonietta Mira (U della Svizzera Italiana & U of Insubria), Sirio Legramanti (Bocconi U)

Michele Guindani: Modeling human microbiome data via latent nested nonparametric priors

Abstract: The study of the human microbiome has gained substantial attention in recent years due to its relationship with the regulation of the autoimmune system. During the data-preprocessing pipeline, microbes characterized by similar genome are grouped together in Operational Taxonomic Units (OTUs). Since OTU abundances vary widely across individuals within a population, it is of interest to characterize the diversity of the microbiome to study the association between asymmetries in the human microbiota and various diseases. Here, we propose a Bayesian Nonparametric approach to model abundance tables in presence of multiple populations: a common set of parameters (atoms at the observational level) is used to construct, at a higher level, a set of atoms on a distributional space. Using a common set of atoms at the lower level yields an important advantage: our model does not degenerate to the full exchangeable case when there are ties across samples, thus overcoming the crucial problem of the traditional Nested Dirichlet process outlined by Camerlenghi et al. (2018). To perform posterior inference, we propose a novel Nested independent slice-efficient algorithm. Since OTUs tables consist of frequency counts and are known to be sparse, we express the likelihood as a Rounded Mixture of Gaussian Kernels. Simulation studies confirm that our model does not suffer the nDPMM drawback anymore, and first applications to the microbiomes of Bangladesh babies have shown promising results.

Antonietta Mira: Adaptive incremental mixture Markov chain Monte Carlo

Abstract: We propose Adaptive Incremental Mixture Markov chain Monte Carlo (AIMM), a novel approach to sample from challenging probability distributions defined on a general state-space. While adaptive MCMC methods usually update a parametric proposal kernel with a global rule, AIMM locally adapts a semiparametric kernel. AIMM is based on an independent Metropolis-Hastings proposal distribution which takes the form of a finite mixture of Gaussian distributions. Central to this approach is the idea that the proposal distribution adapts to the target by locally adding a mixture component when the discrepancy between the proposal mixture and the target is deemed to be too large. As a result, the number of components in the mixture proposal is not fixed in advance. Theoretically, we prove that there exists a process that can be made arbitrarily close to AIMM and that converges to the correct target distribution. We also illustrate that it performs well in practice in a variety of challenging situations, including high-dimensional and multimodal target distributions.

Sirio Legramanti: Bayesian cumulative shrinkage for infinite factorizations

Abstract: There is a wide variety of models in which the dimension of the parameter space is unknown. For example, in factor analysis the number of latent factors is typically not known and has to be inferred from the observed data. Although classical shrinkage priors are useful in these contexts, increasing shrinkage priors can provide a more effective option, which progressively penalizes expansions with growing complexity. We propose a novel increasing shrinkage prior, named the cumulative shrinkage process, for the parameters controlling the dimension in over-complete formulations. Our construction has broad applicability, simple interpretation, and is based on a sequence of spike and slab distributions which assign increasing mass to the spike as model complexity grows. Using factor analysis as an illustrative example, we show that this formulation has theoretical and practical advantages over current competitors, including an improved ability to recover the model dimension. An adaptive Markov chain Monte Carlo algorithm is proposed, and the methods are evaluated in simulation studies and applied to personality traits data. (Joint with Daniele Durante and David Dunson)

Using Bayesian methods to uncover the latent structures in real datasets: Louis Raynal (U of Montpellier & Harvard U), Francesco Denti (U of Milan – Bicocca & U della Svizzera Italiana), Alex Rodriguez (International Center for Theoretical Physics).

Louis Raynal: Reconstructing the evolutionary history of the desert locust by means of ABC random forest

Abstract: The Approximate Bayesian Computation - Random Forest (ABC-RF) method- ology recently developed to perform model choice (Pudlo et al., 2016; Estoup et al., 2018) and parameter inference (Raynal et al., 2019). It proved to achieve good performance, is mostly insensitive to noise variables and requires very few calibration. In this presentation we expose recent improvements, with a focus on the computation of error measures with random forests for parameter in- ference. As a case study, we are interested in the Schistocerca gregaria desert locust species which is divided in two distinct regions along the north-south axis of Africa. Using ABC-RF on microsatellite data, we reconstruct the evolu- tionary processes explaining the present geographical distribution and estimate parameters as the divergence time between the north and south sub-species.

Francesco Denti: Bayesian nonparametric dimensionality reduction via estimation of data intrinsic dimensions

Abstract: Even if they are defined on a space with a large dimension, data points usually lie onto hypersurfaces with a much smaller intrinsic dimension (ID). The recent Hidalgo method (Allegra et al., 2019), a Bayesian extension of the TWO-NN model (Facco et al., 2017, Scientific Report), allows estimating the ID when all points lie onto multiple latent manifolds. We consider the data points as a configuration of a Poisson Process (PP) with an intensity proportional to the true density. Hidalgo makes only two weak assumptions: (i) locally, on the scale of the second nearest neighbor, the original PP can be well approximated by a homogeneous one and (ii) points close to each other are more likely to belong to the same manifold. Under (i), the ratio of the distances of a point from its first and second neighbor follows a Pareto distribution that depends parametrically only on the ID. We extended Hidalgo to the Nonparametric case, allowing the estimation of the number of latent manifolds via Dirichlet Process Mixture Model and inducing a clustering among observations characterized by similar ID. We further derive the distributions of the ratios of subsequent distances between neighbors and we prove their independence. This enables us to extract more information from the data without compromising the scalability of our method. While the idea behind the extension is simple, a non-trivial Bayesian scheme is required for estimating the model and assigning each point to the correct manifold. Since the posterior distribution has no closed form, to sample from it we rely on the Slice Sampler algorithm. From preliminary analyses performed on simulated data, the model provides promising results. Moreover, we were able to uncover a surprising ID variability in several real-world datasets.

Alex Rodriguez: Mapping the topography of complex datasets

Abstract: Data sets can be considered an ensemble of realizations drawn from a density distribution. Obtaining a synthetic description of this distribution allows rationalizing the underlying generating process and building human-readable models. In simple cases, visualizing the distribution in a suitable low-dimensional projection is enough to capture its main features but real world data sets are often embedded in a high-dimensional space. Therefore, I present a procedure that allows obtaining such a synthetic description in an automatic way with the only information of pairwise data distances (or similarities). This methodology is based on a reliable estimation of the intrinsic dimension of the dataset (Facco, et al., 2017) and the probability density function (Rodriguez, et al., 2018) coupled with a modified Density Peaks clustering algorithm (Rodriguez and Laio, 2014). The final outcome of all this machinery working together is a hierarchical tree that summarizes the main features of the data set and a classification of the data that maps to which of these features they belong to (d'Errico, et al., 2018).

MCMC-based Bayesian inference on Hilbert spaces: Nawaf Bou-Rabee (Rutgers U), Nathan Glatt-Holtz (Tulane U), Daniel Sanz-Alonso (U of Chicago)

Nawaf Bou-Rabee: Two-scale coupling for preconditioned Hamiltonian Monte Carlo in infinite dimensions

Abstract: We present non-asymptotic quantitative bounds for convergence to equilibrium of the exact preconditioned Hamiltonian Monte Carlo algorithm (pHMC) on a Hilbert space. As a consequence, we obtain explicit and dimension-free bounds for pHMC applied to high-dimensional distributions arising in transition path sampling and path integral molecular dynamics. Global convexity of the underlying potential energies is not required. Our results are based on a two-scale coupling which is contractive in a carefully designed distance.

Nathan Glatt-Holtz: A Bayesian approach to quantifying uncertainty divergence free flows

Abstract: We treat the statistical regularization of the ill-posed inverse problem of estimating a divergence free flow field $u$ from the partial and noisy observation of a passive scalar $\theta$. Our solution is Bayesian posterior distribution, a probability measure $\mu$ which precisely quantifies uncertainties in u once one specifies models for measurement error and prior knowledge for $u$. We present some of our recent work which analyzes $\mu$ both analytically and numerically. In particular we discuss some Markov Chain Monte Carlo (MCMC) algorithms which we have developed and refined to effectively sample from $\mu$. (This is joint work with Jeff Borggaard and Justin Krometis.)

Daniel Sanz-Alonso: Scalable MCMC for graph based learning

Abstract: In this talk I will consider two graph-based learning problems. The first one concerns a graph formulation of Bayesian semi-supervised learning, and the second one concerns kernel discretization of Bayesian inverse problems on manifolds. I will show that understanding the continuum limit of these graph-based problems is helpful in designing sampling algorithms whose rate of convergence does not deteriorate in the limit of large number of graph nodes.

Advances in multiple importance sampling: Art Owen (Stanford U), Victor Elvira (U of Edinburgh), Felipe Medina Aguayo (U of Reading).

Art Owen: Robust deterministic weighting of estimates from adaptive importance sampling

Abstract: This talk presents a simple robust way to weight a sequence of estimates generated by adaptive importance sam- pling. Importance sampling is a useful method for estimating rare event probabilities and for sampling posterior distributions. It often generates data that can be used to find an improved sampler leading to methods of adaptive importance sampling (AIS). Under ideal conditions, AIS can approach a perfect sampler and the mean squared error (MSE) vanishes exponentially fast. Under less ideal conditions, including all nontrivial uses of self-normalized importance sampling, the MSE is bounded below by a positive multiple of $1/n$. That rules out exponential convergence but still allows for steady improvements. If we model steady improvement as yielding a sequence of unbiased and uncorrelated estimates with variance proportional to $k^{−y}$ for $1 \le k \le K < \infty$ and $0 \le y \le 1$, then a simple model weighting the $k$th iterate proportionally to $k^{1/2} is nearly optimal. It never raises variance by more than 9/8 over an oracle’s variance even though the resulting convergence rate varies with $y$. Numerical investigation shows that these weights are also robust under additional models of gradual improvement. (This is joint work with Yi Zhou.)

Victor Elvira: Multiple importance sampling for rare events estimation with an application in communication systems

Abstract: Digital communications are based on the transmission of symbols that belong to a finite alphabet, each of them carrying one or several bits of information. The receiver estimates the symbol that was transmitted, and in the case of perfect communication without errors, the original sequence of bits is reconstructed. However, real-world communication systems (e.g., in wireless communications) introduce random distortions in the symbols, including additive Gaussian noise, provoking errors in the detected symbols at the receiver. The characterization of the symbol error rate (SER) of the system is of major interest in communications engineering. However, in many systems of interest, the integrals required to evaluate the symbol error rate (SER) in the presence of Gaussian noise are impossible to compute in closed-form, and therefore Monte Carlo simulation is typically used to estimate the SER. Naive Monte Carlo simulation has been traditionally used in the communications literature, even if it can be very inefficient and require very long simulation runs, especially at high signal-to-noise-ratio (SNR) scenarios. At high SNR, the variance of the additive Gaussian noise is small, and hence the rate of errors is very low, which yields raw Monte Carlo impracticable for this rare event estimation problem. In this talk, we start describing (for non-experts) the problem of SER estimation of communication system. Then, we adapt a recently proposed multiple importance sampling (MIS) technique, called ALOE (for "At Least One rare Event") to this problem. Conditioned to a transmitted symbol, an error (or rare event) occurs when the observation falls in a union of half-spaces or, equivalently, outside a given polytope. The proposal distribution for ALOE samples the system conditionally on an error taking place, which makes it more efficient than other importance sampling techniques. ALOE provides unbiased SER estimates with simulation times orders of magnitude shorter than conventional Monte Carlo. Then, we discuss the challenges of SER estimation in multiple-input multiple-output (MIMO) communications, where the rare-event estimation problem requires solving a large number of integrals in a higher-dimensional space. We propose a novel MIS-based approach exploiting the strengths of the ALOE estimator.

Felipe Medina Aguayo: Revisiting balance heuristic with intractable proposals

Abstract: Among the different flavours of multiple importance sampling, the celebrated balance heuristic (BH) from Veach and Guibas still remains a popular choice for estimating integrals. The basic ingredients in BH are: a set of proposals $q_l$ , indexed by some discrete label $l$, and a deterministic set of weights for these labels. However, in some scenarios sampling from $q_l$ is only achieved by sampling jointly with the label $l$; this commonly leads to a joint density whose conditionals and marginals are unavailable or expensive to compute. Despite BH being valid even if the labels are sampled randomly, the intractability of the joint proposal can be problematic, especially when the number of discrete labels is much larger than the number of permitted importance points. In this talk, we first revisit balance heuristic from an extended-space angle, which allows the introduction of intermediate distributions as in annealing importance sampling for variance reduction. We then look at estimating integrals when the proposal is only available in a joint form via a combination of correlated estimators. This idea also fits into the extended-space representation which will, in turn, provide other interesting solutions. (This is joint work with Richard Everitt, U of Reading.)

Simulation in path space: Sebastiano Grazzi (TU Delft), Frank van der Meulen (TU Delft), Joris Bierkens (Vrije U Amsterdam).

Sebastiano Grazzi: A piecewise deterministic Monte Carlo method for diffusion bridges

Abstract: We introduce the use of the Zig-Zag sampler to the problem of sampling diffusion bridges. The Zig-Zag sampler is a rejection-free sampling scheme based on a non-reversible continuous piecewise deterministic Markov process. Similar to the L\'evy-Ciesielski's construction of a Brownian motion, we expand the diffusion path in a truncated Faber-Schauder basis. The coefficients within the basis are sampled using a Zig-Zag sampler with truncation error that vanishes with increasing truncation level. A key innovation is the use of a local version of the Zig-Zag sampler that allows to exploit the sparse dependency structure of the coefficients of the Faber-Schauder expansion to reduce the complexity of the algorithm. We illustrate the performance of the proposed methods in a number of examples. Contrary to some other Markov Chain Monte Carlo methods our approach works well in case of strong nonlinearity in the drift.

Frank van der Meulen: Diffusion bridge simulation in geometric statistics

Abstract: Recently various stochastic landmarks models have been introduced for shape deformation. The basic modelling consists of stochastic differential equations. Due to the high dimensionality of the state space of these equations the statistical analysis is challenging. Moreover, the diffusion process is hypo-elliptic. Novel methods are discussed to tackle this problem based on methods for simulation of conditioned diffusions.

Joris Bierkens: Infinite dimensional piecewise deterministic Monte Carlo

Abstract: In Bayesian inverse problems one is interested in performing computations with respect to an infinite dimensional probability distribution. A modern computational approach consists of approximating this infinite dimensional probability distribution by running a truncated version of a genuine infinite dimensional Markov chain. If a well-posed infinite dimensional chain exists, then the truncated, finite-dimensional approximation may be expected to have desirable scaling properties with respect to dimension. In this talk we present some preliminary explorations of this topic in conjunction with the recent advance of Piecewise Deterministic Monte Carlo methods such as the Bouncy Particle Sampler and the Zig-Zag Sampler. (Joint with Andrew Duncan and Michela Ottobre.)

Sequential Monte Carlo: Recent advances in theory and practice: Richard Everitt (U of Reading), Liangliang Wang (Simon Fraser U), Anthony Lee (U of Bristol).

Richard Everitt: Evolution with recombination using state-of-the-art computational methods

Abstract: Recombination is a critical process in evolutionary inference, particularly when analysing within-species variation. In bacteria, despite being organisms that reproduce clonally, recombination commonly occurs when a donor cell contributes a small segment of its DNA. This process is typically modelled using an ancestral recombination graph (ARG), which is a generalisation of the coalescent. The ClonalOrigin model ([Didelot et al. 2010]) can be regarded as a good approximation of the aforementioned process, in which recombination events are modelled independently given the clonal genealogy. Inference in the ClonalOrigin model is performed via a reversible-jump MCMC (rjMCMC) algorithm, which attempts to jointly explore: the recombination rate, the number of recombination events, the departure and arrival points on the clonal genealogy for each recombination event, and the sites delimiting the start and end of each recombination event on the genome. However, as known by computational statisticians, the rjMCMC algorithm usually performs poorly due to the difficulty of proposing “good” trans- dimensional moves. Recent developments in Bayesian computation methodology provide ways of improving existing methods and code, but are not well-known outside the statistics community. We present a couple of ideas based on sequential Monte Carlo (SMC) methodology that can lead to faster inference when using the ClonalOrigin model. (This is joint work with Felipe Medina Aguayo and Xavier Didelot.)

Liangliang Wang: Sequential Monte Carlo methods for Bayesian phylogenetics

Abstract: Phylogenetic trees, playing a central role in biology, model evolutionary histories of taxa that range from genes to genomes. The goal of Bayesian phylogenetics is to approximate a posterior distribution of phylogenetic trees based on biological data. Standard Bayesian estimation of phylogenetic trees can handle rich evolutionary models but requires expensive Markov chain Monte Carlo (MCMC) simulations. Our previous work has shown that sequential Monte Carlo (SMC) methods can serve as a good alternative to MCMC in posterior inference over phylogenetic trees. In this talk, I will present our recent work on SMC methods for Bayesian Phylogenetics. We illustrate our methods using simulation studies and real data analysis.

Anthony Lee: Latent variable models: statistical and computational efficiency for simple likelihood approximations

Abstract: A popular statistical modelling technique is to model data as a partial observation of a random process. This allows, in principle, one to fit sophisticated domain-specific models with easily interpretable parameters. However, the likelihood function in such models is typically intractable, and so likelihood-based inference techniques must deal with this intractability in some way. I will briefly talk about two likelihood-based methodologies, pseudo-marginal Markov chain Monte Carlo and simulated maximum likelihood, and discuss statistical and computational scalability in some example settings. The results are also relevant to the use of sequential Monte Carlo algorithms in high-dimensional general state-space hidden Markov models.

Advances in MCMC for high dimensional and functional spaces: Galin Jones (U of Minnesota), Vivekananda Roy (Iowa State U), Radu Herbei (The Ohio State U)

Galin Jones: Convergence complexity of Gibbs samplers for Bayesian vector autoregressive processes

Abstract: We propose a collapsed Gibbs sampler for Bayesian vector autoregressions with predictors, or exogenous variables, and study the proposed sampler’s convergence properties. The Markov chain generated by our algorithm is shown to be geometrically ergodic regardless of whether the number of observations in the underlying vector autoregression is small or large in comparison to the order and dimension of it. We also establish conditions for when the geometric ergodicity is asymptotically stable as the number of observations tends to infinity. Specifically, the geometric convergence rate is shown to be bounded away from unity asymptotically, either in an almost sure sense or with probability tending to one, depending on what is assumed about the data generating process. (This is joint work with Karl Oskar Ekvall.)

Vivekananda Roy: Posterior impropriety of relevance vector machines and a single penalty approach

Abstract: Researchers often use sparse Bayesian learning models that take a reproducing kernel Hilbert space approach to carry out the task of prediction for high dimensional datasets. The popular relevance vector machines (RVM) is one such sparse Bayesian learning model. We show that the RVM with hyperparameter values currently used in the literature leads to improper posteriors. We propose a single penalty RVM (SPRVM) model and analyze it using a semi Bayesian approach. The necessary and sufficient conditions for posterior propriety of SPRVM are more liberal than those of RVM and allow for several improper priors over the penalty parameter. Additionally, we also prove geometric ergodicity of the Gibbs sampler used to analyze the SPRVM model and hence can estimate the asymptotic standard errors associated with the Monte Carlo estimate of the means of the posterior predictive distribution. The predictive performance of RVM and SPRVM is compared by analyzing several datasets. (This is joint work with Anand Dixit.)

Radu Herbei: Exact inference in functional regression: Estimating hydrological controls on ecosystem dynamics in an Antarctic lake

Abstract: Many of the modern-day statistical inference problems address the issue of estimating an infinite dimensional parameter (a function or a surface). Given that one can only store a finite representation of these objects on a computer, the typical approach is to employ some dimension-reduction strategy and proceed with a statistical inference procedure in a multivariate setting. We introduce an exact inference procedure for functional parameters in a Bayesian regression setting. By "exact" we mean that the MCMC sampler used to explore the posterior distribution over the functional parameter is unaffected by the fact that only finite dimensional ojects are used during the simulation procedure. We use techniques based on randomized acceptance probabilities and Bernoulli factories to ensure that the sampler targets the correct distribution. We apply our method to the problem of estimating the association between stream discharge and physical, chemical, and biological processes within an Antarctic lake system.

Recent advances in Gaussian process computations and theory: Yun Yang (U of Illinois), Joseph Futoma (Harvard U), Michael Zhang (Princeton U).

Yun Yang: Frequentist coverage and sup-norm convergence rate in Gaussian process regression

Abstract: GP regression is a powerful interpolation technique due to its flexibility in capturing non-linearity. In this talk, we provide a general framework for understanding the frequentist coverage of point-wise and simultaneous Bayesian credible sets in random design GP regression. Identifying both the mean and covariance function of the posterior distribution of the Gaussian process as regularized M-estimators, we show that the sampling distribution of the posterior mean function and the centered posterior distribution can be respectively approximated by two population level GPs. By developing a comparison inequality between two GPs, we provide exact characterization of frequentist coverage probabilities of Bayesian pointwise credible intervals and simultaneous credible bands of the regression function. Our results show that inference based on GP regression tends to be conservative; when the prior is under-smoothed, the resulting credible intervals and bands have minimax-optimal sizes, with their frequentist coverage converging to a non-degenerate value between their nominal level and one. As a byproduct of our theory, we show that GP regression also yields minimax-optimal posterior contraction rate relative to the supremum norm, which provides positive evidence to the long-standing problem on optimal supremum norm contraction rate in GP regression.

Joseph Futoma: Learning to Detect Sepsis with a Multi-output Gaussian Process RNN Classifier (in the Real World!)

Abstract: Sepsis is a poorly understood and potentially life-threatening complication that can occur as a result of infection. Early detection and treatment improve patient outcomes, and as such it poses an important challenge in medicine. In this work, we develop a flexible classifier that leverages streaming lab results, vitals, and medications to predict sepsis before it occurs. We model patient clinical time series with multi-output Gaussian processes, maintaining uncertainty about the physiological state of a patient while also imputing missing values. Latent function values from the Gaussian process are then fed into a deep recurrent neural network to classify patient encounters as septic or not, and the overall model is trained end-to-end using back-propagation. We train and validate our model on a large retrospective dataset of 18 months of heterogeneous inpatient stays from the Duke University Health System, and develop a new “real-time” validation scheme for simulating the performance of our model as it will actually be used. We conclude by showing how this model is saving lives as a part of SepsisWatch, an application currently being used at Duke Hospital to screen, monitor, and coordinate treatment of septic patients.

Michael Zhang: Embarrassingly parallel inference for Gaussian processes

Abstract: Gaussian process-based models typically involves an $O(N^3)$ computational bottleneck due to inverting the covariance matrix. Popular methods for overcoming this matrix inversion problem cannot adequately model all types of latent functions and are often not parallelizable. However, judicious choice of model structure can ameliorate this problem. A mixture-of-experts model that uses a mixture of $K$ Gaussian processes offers modeling flexibility and opportunities for scalable inference. Our embarrassingly parallel algorithm combines low-dimensional matrix inversions with importance sampling to yield a flexible, scalable mixture-of-experts model that offers comparable performance to Gaussian process regression at a much lower computational cost.

Posterior inference with misspecified models: Judith Rousseau (U of Oxford), Ryan Martin (North Carolina State U), Jonathan Huggins (Harvard U)

Judith Rousseau: Using asymptotics to understand ABC

Abstract: Approximate Bayesian computations are used typically when the model is so complex that the likelihood is intractable but data can be generated from the model. With the initial focus being primarily on the practical import of this algorithm, exploration of its formal statistical properties has begun to attract more attention. In this work we consider the asymptotic behaviour of the posterior obtained by this method and the ensuing poste- rior mean. We give general results on: (i) the rate of concentration of the resulting posterior on sets containing the true parameter (vector); (ii) the limiting shape of the posterior; and (iii) the asymptotic distribution of the ensuing posterior mean. These results hold under given rates for the toler- ance used within the method, mild regularity conditions on the summary statistics, and a condition linked to identification of the true parameters. I will show in particular that we have very different behaviours if the model is well or mis-specified. I will highlight what are the practical implications of these results on the understanding of the behaviour of the algorithm. (Joint work with David Frazier, Gael Martin and Christian Robert.)

Ryan Martin: Construction, concentration, and calibration of Gibbs posteriors

Abstract: A Bayesian approach, which bases inference on a posterior distribution, has certain advantages, but at the expense of requiring specification of a full statistical model. A Gibbs approach, on the other hand, provides a posterior distribution based on a loss function instead of a likelihood, which has its own advantages, including robustness and computational savings. While the concentration properties of suitably constructed Gibbs posteriors are fairly well understood, the mis- or under-specification affects the spread of the Gibbs posterior in subtle ways. In particular, it is not clear how to scale the Gibbs posterior so that the corresponding credible regions are calibrated in the sense that they achieve the nominal coverage probability. In this talk, I will present some generalities about the construction, concentration, and calibration of Gibbs posteriors along with applications, including an image boundary detection problem.

Jonathan Huggins: Using bagged posteriors for robust inference and model criticism

Abstract: Standard Bayesian inference is known to be sensitive to model misspecification, leading to unreliable uncertainty quantification and poor predictive performance. However, finding generally applicable and computationally feasible methods for robust Bayesian inference under misspecification has proven to be a difficult challenge. An intriguing approach is to use bagging on the Bayesian posterior (“BayesBag”); that is, to use the average of posterior distributions conditioned on bootstrapped datasets. In this talk, I comprehensively develop the asymptotic theory of BayesBag, propose a model–data mismatch index for model criticism using BayesBag, and empirically validate our theory and methodology on synthetic and real-world data. I find that in the presence of significant misspecification, BayesBag yields more reproducible inferences, has better predictive accuracy, and selects correct models more often than the standard Bayesian posterior; meanwhile, when the model is correctly specified, BayesBag produces superior or equally good results for parameter inference and prediction, while being slightly more conservative for model selection. Overall, my results demonstrate that BayesBag combines the attractive modeling features of standard Bayesian inference with the distributional robustness properties of frequentist methods.

Convergence of MCMC in theory and in practice: Christina Knudson (U of St. Thomas, MN), Rui Jin (U of Iowa), Grant Backlund (U of Florida)

Christina Knudson: Revisiting the Gelman-Rubin Diagnostic

Abstract: Gelman and Rubin's (1992) convergence diagnostic is one of the most popular methods for terminating a Markov chain Monte Carlo (MCMC) sampler. Since the seminal paper, researchers have developed sophisticated methods of variance estimation for Monte Carlo averages. We show that this class of estimators find immediate use in the Gelman-Rubin statistic, a connection not established in the literature before. We incorporate these estimators to upgrade both the univariate and multivariate Gelman-Rubin statistics, leading to increased stability in MCMC termination time. An immediate advantage is that our new Gelman-Rubin statistic can be calculated for a single chain. In addition, we establish a relationship between the Gelman-Rubin statistic and effective sample size. Leveraging this relationship, we develop a principled cutoff criterion for the Gelman-Rubin statistic. Finally, we demonstrate the utility of our improved diagnostic via an example.

Rui Jin: Central limit theorems for Markov chains based on their convergence rates in Wasserstein distance

Abstract: Many tools are available to bound the convergence rate of Markov chains in total variation (TV) distance. Such results can be used to establish central limit theorems (CLT) that enable error evaluations of Monte Carlo estimates in practice. However, convergence analysis based on TV distance is often non-scalable to the increasing dimension of Markov chains (Qin and Hobert (2018); Rajaratnam and Sparks (2015)). Alternatively, bounding the convergence rate of Markov chains in Wasserstein distance can be more robust to increasing dimension, thanks to a coupling argument. Our work is concerned with the implication of such convergence results, in particular, do they lead to CLTs of the corresponding Markov chains? An indirect and typically non-trivial way is to first convert Wasserstein bounds into total variation bounds. Instead, we attempt to establish CLTs based on convergence rate in Wasserstein distance directly. We establish a CLT for Markov chains that enjoy certain convergence rates (including the geometric rate and some sub-geometric rates) in Wasserstein distance, and the CLT holds for Lipschitz functions under some moment conditions. Applications of the CLT and its variations are demonstrated with examples. (Joint work with Aixin Tan.)

Grant Backlund: A hybrid scan Gibbs sampler for Bayesian models with latent variables

Abstract: Gibbs sampling is a widely popular Markov chain Monte Carlo algorithm which is often used to analyze intractable posterior distributions associated with Bayesian hierarchical models. We introduce an alternative to traditional Gibbs sampling that is particularly well suited for Bayesian models which contain latent or missing data. This hybrid scan Gibbs algorithm is often easier to analyze from a theoretical standpoint than the systematic or random scan Gibbs sampler. Several examples including linear regression with heavy-tailed errors and a Bayesian version of the general linear mixed model will be presented. Results concerning the convergence rates of the corresponding Markov chains will also be discussed.

Robust Markov chain Monte Carlo methods: Kengo Kamatani (Osaka U), Emilia Pompe (U of Oxford), Björn Sprungk (Göttingen U)

Kengo Kamatani: Robust Markov chain Monte Carlo methodologies with respect to tail properties

Abstract: In this talk, we will discuss Markov chain Monte Carlo (MCMC) methods with heavy-tailed invariant probability distributions. When the invariant distribution is heavy-tailed the algorithm has difficulty reaching the tail area. We study the ergodic properties of some MCMC methods with position dependent proposal kernels and apply them to heavy-tailed target distributions.

Emilia Pompe: A framework for adaptive MCMC targeting multimodal distributions

Abstract: We propose a new Monte Carlo method for sampling from multimodal distributions (Jumping Adaptive Multimodal Sampler). The idea of this technique is based on splitting the task into two: finding the modes of the target distribution and sampling, given the knowledge of the locations of the modes. The sampling algorithm is based on steps of two types: local ones, preserving the mode, and jumps to a region associated with a different mode. Besides, the method learns the optimal parameters while it runs, without requiring user intervention. Our technique should be considered as a flexible framework, in which the design of moves can follow various strategies known from the broad MCMC literature. In order to design an adaptive scheme that facilitates both local and jump moves, we introduce an auxiliary variable representing each mode and we define a new target distribution on an augmented state space. As the algorithm runs and updates its parameters, the target distribution also keeps being modified. This motivates a new class of algorithms, Auxiliary Variable Adaptive MCMC. We prove general ergodic results for the whole class before specialising to the case of our algorithm. The main properties of our method will be discussed and its performance will be illustrated with several examples of multimodal target distributions.

Björn Sprungk: Noise level-robust Metropolis-Hastings algorithms for Bayesian inference with concentrated posteriors

Abstract: We consider Metropolis-Hastings algorithms for Markov chain Monte Carlo integration w.r.t. a concentrated posterior measure which results from Bayesian inference with a small additive observational noise. Proposal kernels based only on prior information show a deteriorating efficiency for a decaying noise. We propose to use informed proposal kernels, i.e., random walk proposals with a covariance close to the posterior covariance. Here, we use the a-priori computable covariance of the Laplace approximation of the posterior. Besides some numerical evidence we prove that the resulting informed Metropolis-Hastings shows a non-degenerating mean acceptance rate and lag-one autocorrelation as the noise decays. Thus, it performs robustly w.r.t. a small noise-level in the Bayesian inference problem. The theoretical results are based on the recently established convergence of the Laplace approximation to the posterior measure in total variation norm.

Approximate Markov chain Monte Carlo methods: Bamdad Hosseini California Institute of Technology, James Johndrow (U of Pennsylvania), Daniel Rudolf (Göttingen U)

Bamdad Hosseini: Perturbation theory for a function space MCMC algorithm with non-Gaussian priors

Abstract: In recent years a number of function space MCMC algorithms have been introduced in the literature. The goal here is to design an algorithm that is well-defined on an infinite-dimensional Banach space with the hope that it will be discretization invariant and overcome some issues that are encountered by standard MCMC algorithms in high-dimensions. However, most of the focus in the literature has been on algorithms that rely on the assumption that the prior measure is a Gaussian or at least absolutely continuous with a Gaussian measure. In this talk we introduce a new class of prior-aware Metropolis-Hastings algorithms for non-Gaussian priors and discuss their convergence and perturbation properties such as dimension-independent spectral gaps and various types of approximations beyond standard approximation by discretization or projections.

James Johndrow: Metropolizing approximate Gibbs samplers

Abstract: There has been much recent work on “approximate” MCMC algorithms, such as Metropolis-Hastings algorithms that rely on minibatches of data, resulting in bias in the invariant measure. Less studied are the various ways in which approximate Gibbs samplers can be designed. We describe a general strategy for using approximate Gibbs samplers as Metropolis-Hastings proposals. Because it is typically less costly to compute the unnormalized posterior density than to take one step of exact Gibbs, and because the Hastings ratio in these algorithms requires only computation of the approximating kernel at pairs of points, one can often achieve reductions in computational complexity per step with no bias in the invariant measure by using approximate Gibbs as a Metropolis-Hastings proposal. We demonstrate the approach with an application to high-dimensional regression.

Daniel Rudolf: Time-inhomogeneous approximate Markov chain Monte Carlo

Abstract: We discuss the approximation of a time-homogeneous Markov chain by a time-inhomogeneous one. An upper bound of the expected absolute difference of the stationary mean, w.r.t. the Markov chain of interest, and the ergodic average based on the approximating Markov chain will be presented. In addition to that we provide explicit estimates of the Wasserstein distance of the difference of the distributions of the Markov chains after n-steps.

Sampling Techniques for High-Dimensional Bayesian Inverse Problems: Qiang Liu (U of Texas), Tan Bui-Thanh (U of Texas), Alex Thiery (National U of Singapore)

Qiang Liu: Stein variational gradient descent: Algorithm, theory, applications

Abstract: Approximate probabilistic inference is a key computational task in modern machine learning, which allows us to reason with complex, structured, hierarchical (deep) probabilistic models to extract information and quantify uncertainty. Traditionally, approximate inference is often performed by either Markov chain Monte Carlo (MCMC) and variational inference (VI), both of which, however, have their own critical weaknesses: MCMC is accurate and asymptotically consistent but suffers from slow convergence; VI is typically faster by formulating inference problem into gradient-based optimization, but introduces deterministic errors and lacks theoretical guarantees. Stein variational gradient descent (SVGD) is a new tool for approximate inference that combines the accuracy and flexibility of MCMC and practical speed of VI and gradient-based optimization. The key idea of SVGD is to directly optimize a non-parametric particle-based representation to fit intractable distributions with fast deterministic gradient-based updates, which is made possible by integrating and generalizing key mathematical tools from Stein's method, optimal transport, and interacting particle systems. SVGD has been found a powerful tool in various challenging settings, including Bayesian deep learning and deep generative models, reinforcement learning, and meta learning. This talk will introduce the basic ideas and theories of SVGD, and cover some examples of application.

Tan Bui-Thanh: A data-consistent approach to statistical inverse problems

Abstract: Given a hierarchy of reduced-order models to solve the inverse problems for quantities of interest, each model with varying levels of fidelity and computational cost, a machine learning framework is proposed to improve the models by learning the errors between each successive levels. Each reduced-order model is a statistical model generating rapid and reasonably accurate solutions to new parameters, and are typically formed using expensive forward solves to find the reduced subspace. These approximate reduced-order models speed up computational time but they introduce additional uncertainty to the solution. By statistically modeling errors of reduced order models and using training data involving forward solves of the reduced order models and the higher fidelity model, we train a deep neural network to learn the error between successive levels of the hierarchy of reduced order models thereby improving their error bounds. The training of the deep neural network occurs during the offline phase and the error bounds can be improved online as new training data is observed. Once the deep-learning-enhanced reduced model is constructed, it is amenable to any sampling method as its cos is a fraction of the cost of the original model.

Alex Thiery: Exploiting geometry for walking larger steps in Bayesian inverse problems

Abstract: Consider the observation $y = F(x) + \xi$ of a quantity of interest $x$ -- the random variable $\xi \sim \mathcal{N}(0, \sigma^2 I)$ is a vector of additive noise in the observation. In Bayesian inverse problems, the vector $x$ typically represents the high-dimensional discretization of a continuous and unobserved field while the evaluations of the forward operator $F(\cdot)$ involve solving a system of partial differential equations. In the low-noise regime, i.e. $\sigma \to 0$, the posterior distributions concentrates in the neighbourhood of a nonlinear manifold. As a result, the efficiency of standard MCMC algorithms deteriorates due to the need to take increasingly smaller steps. In this work, we present a constrained HMC algorithm that is robust to small $\sigma$ values, i.e. low noise. Taking the observations generated by the model to be constraints on the prior, we define a manifold on which the constrained HMC algorithm generate samples. By exploiting the geometry of the manifold, our algorithm is able to take larger step sizes than more standard MCMC methods, resulting in a more efficient sampler. If time permits, we will describe how similar ideas can be leveraged within other non-reversible samplers.

Short Courses/Tutorials/Practice Labs

Introduction to Stan (10:30am-1:30pm)

Trainer: Robert Grant is a medical statistician of 21 years' experience, and a professional trainer and coach for people working in data analysis. He developed and maintains the Stata interface for Stan and frequently teaches introductory courses on Bayesian statistics and data visualization. His personal website is robertgrantstats.co.uk and his company's is bayescamp.com

Pre-requisites: Participants should know the basics of model fitting by MCMC simulation. There is no need for experience of Hamiltonian Monte Carlo or Stan but we will assume understanding of Bayesian analysis, model comparison and diagnosing MCMC problems such as non-convergence. Please bring a laptop with one of the Stan interfaces installed -- it doesn't matter which one as we will focus on the Stan code which is common to all.

Learning outcomes: (1) Know how to get started with Stan via the various interfaces, including the common functionality of checking your model code for errors, translating it to C++, compiling it, sampling from the posterior, summarising the output and exporting chains. (2) Understand the basics of coding regression models up to multilevel models. (3) Be aware of tricks for more efficient parameterisation (4) Know how to obtain statistical and graphical diagnostic outputs, recognise problems and set about debugging. (5) Know how to add a new distribution as a Stan function, expose it to R/Python/Julia for debugging, and use it in the log-likelihood and posterior predictive checks.

Developing, modifying, and sharing Bayesian algorithms (MCMC samplers, SMC, and more) using the NIMBLE platform in R (2:00-5:00pm)

NIMBLE is a platform built on top of R that allows methodologists to write algorithms (and modify existing algorithms) in R-like syntax with automatic compilation for fast run-times via C++ that is auto-generated by the system. NIMBLE gives you access to a variety of tools for ease of implementation: querying of model graphical structure (e.g., parent and child nodes in the model graph), a wide range of mathematical functionality including linear algebra through the Eigen package, calculation of probability density values for nodes in the model graph, simulation of node values, automatic differentiation for gradients, optimization, and storage objects for samples from the model.

This tutorial will introduce you to how to develop algorithms in NIMBLE, including new MCMC samplers and entire new algorithms. We will discuss how developers can build upon NIMBLE's existing algorithms (including a variety of MCMC, Bayesian nonparametric, and SMC methods) to avoid having to reimplement standard methods. Users of methods developed in NIMBLE write their model code in syntax almost identical to BUGS and JAGS but can then apply a variety of algorithms (various MCMC samplers, choosing between samplers, parameter blocking, user-defined samplers, various SMC algorithms, etc.) to the same model. The tutorial will demonstrate how algorithms that you write using NIMBLE are then easily available to users, who can try them out at low cost and compare them to other algorithms available in NIMBLE.

Learning outcomes: The workshop will focus on live demos and hands-on coding. After the workshop, participants will understand (1) how to use NIMBLE to apply algorithms such as MCMC and SMC to fit hierarchical models, (2) how NIMBLE's built-in algorithms are implemented using nimbleFunctions, (3) how to use nimbleFunctions to extend NIMBLE's algorithms, and (4) how to develop algorithms in NIMBLE.

Pre-requisites: Participants should have a basic understanding of Bayesian/hierarchical models and of one or more algorithms such as MCMC or SMC. Some experience with R is also expected. Please bring a laptop; we'll give instructions in advance for installing NIMBLE.

Instructor: Chris Paciorek is one of the core developers of NIMBLE (code repository) and an adjunct professor of Statistics at UC Berkeley. He has presented a variety of workshops and courses on NIMBLE and more generally on statistical computing and Bayesian statistics.

Info for poster presenters:

All poster presenters must bring/print their own posters. Maximum size: 36” x 48”. This can be horizontal or vertical, it will work both ways. Pins, dots, and other means of attachment will be provided on location.
Please check in at the registration desk for information about where the poster presentations will be, set-up times, etc.
Printing in Gainesville: Target Copy, Fedex Print & Ship Center, Office Depot Print and Copy Services

Posters

Concentration inequalities and performance guarantees for hypocoercive MCMC samplers: Luc Rey-Bellet (U of Massachusetts, Amherst)

Abstract: We prove a concentration inequalities for ergodic averages for hypo-cocercive samplers, in particular for the bouncy particle sampler, the zig-zag sampler, and hybrid HMC. This yields two types on performance guarantess: (a) non-asymptotic confidence intervals and (b) uncertainty quantification bounds when using an alternate approximate process.

Convergence behaviour and contraction rates of Hamiltonian Monte Carlo in mean-field models: Katharina Schuh (U of Bonn)

Abstract: We study the convergence behaviour of a transition step of Hamiltonian Monte Carlo (HMC) for probability distributions with a mean-field potential. This mean-field potential consists of a confinement potential for all particles and of a pairwise interaction potential for all pairs of particles. More precisely, we require for the confinement potential strong convexity at infinity but no global convexity, so that multiple-well potentials are allowed. We use a modification of the coupling approach established by Bou-Rabee, Eberle and Zimmer to prove exponential convergence w.r.t a specific constructed Wasserstein distance. In particular, we give for both the exact and the numerical HMC explicit contraction rates which are independent of the number of particles in the mean-field particle system. The number of steps until the target probability distribution is approximated by HMC up to a given error $\epsilon$ follows as a direct consequence from contractivity and is fix if we increase the number of particles.

Fast algorithms and theory for high-dimensional Bayesian varying coefficient models: Ray Bai (U of Pennsylvania)

Abstract: We introduce the nonparametric varying coefficient spike-and-slab lasso (NVC-SSL) for Bayesian estimation and variable selection in high-dimensional varying coefficient models. The NVC-SSL simultaneously estimates the functionals of the significant time-varying covariates while thresholding out insignificant ones. Our model can be implemented using a highly efficient expectation-maximization (EM) algorithm, thus avoiding the computational intensiveness of Markov chain Monte Carlo (MCMC) in high dimensions. Finally, we prove the first theoretical results for Bayesian varying coefficient models when p>>n. Specifically, we derive posterior contraction rates under the NVC-SSL model. Our method is illustrated through simulation studies and data analysis.

Efficient hierarchical Bayesian kernel regression model for grouped count data: Jin-Zhu Yu (Vanderbilt U)

Abstract: Various research applications suffer from small data sets and require highly predictive models. For instance, a major challenge in predicting the recovery rate of communities after disasters is that recovery data are often scarce due to the nature of extreme events. To address this challenge, we propose a model called the Hierarchical Bayesian Kernel Model (HBKM). This model integrates the Bayesian property of improving predictive accuracy as data are dynamically accumulated, the kernel function that can make nonlinear data more manageable, and the hierarchical property of borrowing information from different sources in scarce and diverse data samples. Since the inference of HBKM can be highly inefficient as the number of groups increases while the number of data points of each group remains relatively small, we develop an efficient Gibbs sampler in which the conditional distributions have approximate closed-formed solution. The proposed method is illustrated with synthesized grouped count data and the historical power outage data in Shelby County, Tennessee after the most severe storms since 2007. (Joint with Hiba Baroud.)

Delayed-acceptance sequential Monte Carlo: Optimising computational efficiency on the fly: Joshua Bon (Queensland U of Technology)

Abstract: Delayed-acceptance is a technique for reducing computational effort for expensive likelihoods within a Metropolis-Hasting (MH) sampler. It uses a surrogate to approximate an expensive likelihood, delaying evaluation of proposals (hence acceptance) until they have passed scrutiny by the surrogate likelihood. Importantly, delayed-acceptance preserves the correct MH ratio, and hence target distribution. Within the sequential Monte Carlo (SMC) framework, we adaptively tune the surrogate model to yield better approximations of the expensive likelihood. For example, we can tune linear noise approximations of Markov processes or adapt nonparametric approximations to better match the true likelihood. Overall, we develop a novel algorithm for computationally efficient SMC with expensive likelihood functions. The method is demonstrated on toy and real examples. (Joint with Christopher Drovandi and Anthony Lee.)

Geometrically adapted Langevin algorithm for Markov chain Monte Carlo simulations: Mariya Mamajiwala (U College London)

Abstract: Markov Chain Monte Carlo (MCMC) is a class of methods to sample from a given probability distribution. Of its myriad variants, the one based on the simulation of Langevin dynamics, which approaches the target distribution asymptotically, has gained prominence. The dynamics is specifically captured through a Stochastic Differential Equation (SDE), with the drift term given by the negative of the gradient of the log-likelihood function with respect to the parameters of the distribution. However, the unbounded variation of the noise (i.e. the diffusion term) tends to slow down the convergence, which limits the usefulness of the method. By recognizing that the solution of the Langevin dynamics may be interpreted as evolving on a suitably constructed Riemannian Manifold (RM), considerable improvement in the performance of the method can be realised. Specifically, based on the notion of stochastic development - a concept available in the differential geometric treatment of SDEs - we propose a geometrically adapted variant of MCMC. Unlike the standard Euclidean case, in our setting, the drift term in the modified MCMC dynamics is constrained within the tangent space of an RM defined through the Fisher information metric and the related connection. We show, through extensive numerical simulations, how such a mathematically tenable geometric restriction of the flow enables a significantly faster and accurate convergence of the algorithm.

From the Bernoulli factory to a dice enterprise via perfect sampling of Markov chains: Giulio Morina (U of Warwick)

Abstract: Given a $p$-coin that lands heads with unknown probability $p$, we wish to produce an $f(p)$-coin for a given function $f: (0,1) \rightarrow (0,1)$. This problem is commonly known as the Bernoulli Factory and results on its solvability and complexity have been obtained in \cite{Keane1994,Nacu2005}. Nevertheless, generic ways to design a practical Bernoulli Factory for a given function $f$ exist only in a few special cases. We present a constructive way to build an efficient Bernoulli Factory when $f(p)$ is a rational function with coefficients in $\mathbb{R}$. Moreover, we extend the Bernoulli Factory problem to a more general setting where we have access to an $m$-sided die and we wish to roll a $v$-sided one, i.e. we consider rational functions $f: \Delta^m \rightarrow \Delta^v$ between open probability simplices. Our construction consists of rephrasing the original problem as simulating from the stationary distribution of a certain class of Markov chains - a task that we show can be achieved using perfect simulation techniques with the original $m$-sided die as the only source of randomness. The number of $m$-sided die rolls needed by the algorithm has exponential tails and, in the Bernoulli Factory case, can be bounded uniformly in $p$. En route to optimizing the algorithm we show a fact of independent interest: every finite, integer valued, random variable will eventually become log-concave after convolving with enough Bernoulli trials. (Joint with Krzysztof Latuszynski and Alex Wendland)

Sequential Monte Carlo for Fredholm Integral Equations of the First Kind: Francesca R. Crucinio (U of Warwick)

Abstract: Fredholm integral equations of the first kind $h(y) = \int g(y \mid x)f(x)\ dx$ describe a wide class of inverse problems in which a signal $f$ has to be reconstructed from a distorted signal $h$ given some knowledge of the distortion $g$ (e.g. image processing, medical imaging, stereology). A popular method to approximate $f$ is an infinite dimensional Expectation-Maximization (EM) algorithm that, given an initial guess for $f$, iteratively refines the approximation by including the information given by $h$ and $g$. The EM recursion is then discretised assuming piecewise constant signals, leading to the Richardson-Lucy algorithm (Richardson, 1972; Lucy 1974). We use Sequential Monte Carlo (SMC) to develop a stochastic discretisation of the Expectation-Maximization-Smoothing (EMS) algorithm (Silverman et al., 1990), a regularised variant of EM. This stochastic discretisation does not assume piecewise constant signals and can be implemented when only samples from $h$ are available and $g$ can be evaluated pointwise. This leads to a non-standard SMC scheme for which we extend some asymptotic results ($\mathbb{L}_p$-inequality, strong law of large numbers and almost sure convergence in the weak topology). We compare the novel method with alternatives using a simulation study and present results for realistic systems.

Towards automatic zig-zag sampling: Alice Corbella (U of Warwick)

Abstract: Zig-Zag sampling, introduced by Bierkens et al. 2019, is based on the simulation of a piecewise deterministic Markov process (PDMP) whose switching rate $λ(t)$ is governed by the derivative of the log-target density. To our knowledge, Zig-Zag sampling has been used mainly on simple targets for which derivatives can be computed manually in a reasonable time. To expand the applicability of this method, we incorporate Automatic Differen- tiation (AD) tools in the Zig-Zag algorithm, computing $λ(t)$ automatically from the functional form of the log-target density. Moreover, to allow the simulation of the PDMP via thinning, we use standard optimization routines to find a local upper bound for the rate. We present several implementations of our automatic Zig-Zag sampling and we measure the potential loss in computational time caused by AD and optimization routines. Among the examples, we consider the case of data arising from an epidemic which can be approximated by a deterministic system of equations; here manual derivation of the posterior density is practically infeasible due to the recursive rela- tionships contained the likelihood function. Automatic Zig-Zag sampling successfully explores the parameter space and samples efficiently from the posterior distribution. Lastly, we compare our automatic Zig-Zag sampling against Stan, a well established software that matches AD to another gradient-based method (HMC). (Joint work with Gareth O. Roberts and Simon E. F. Spencer)

Markov chain Monte Carlo algorithms with sequential proposals: Joonha Park (Boston U)

Abstract: We explore a general framework in Markov chain Monte Carlo (MCMC) sampling where sequential proposals are tried as a candidate for the next state of the Markov chain. This sequential-proposal framework can be applied to various existing MCMC methods, including Metropolis-Hastings algorithms using random proposals and methods that use deterministic proposals such as Hamiltonian Monte Carlo (HMC) or the bouncy particle sampler. Sequential-proposal MCMC methods construct the same Markov chains as those constructed by the delayed rejection method under certain circumstances. In the context of HMC, the sequential-proposal approach has been proposed as extra chance generalized hybrid Monte Carlo (XCGHMC). We develop two novel methods in which the trajectories leading to proposals in HMC are automatically tuned to avoid doubling back, as in the No-U-Turn sampler (NUTS). The numerical efficiency of these new methods compare favorably to the NUTS. We additionally show that the sequential-proposal bouncy particle sampler enables the constructed Markov chain to pass through regions of low target density and thus facilitates better mixing of the chain when the target density is multimodal. (Joint with Yves Atchadé.)

Restore: A continuous-time, rejection-free regenerative sampler: Andi Q. Wang (U of Bristol)

Abstract: We introduce the Restore sampler. This is a continuous-time nonreversible sampler, which combines general local dynamics with rebirths from a fixed global rebirth distribution, which occur at a state-dependent rate. Under suitable conditions this rate can be chosen to enforce stationarity of a given target density, making it suitable for Monte Carlo inference. The resulting sampler has several desirable properties: simplicity, lack of rejections, regenerations and a potential coupling from the past implementation. The Restore sampler can also be used as a recipe for introducing rejection-free moves into existing MCMC samplers in continuous time, or potentially to correct posterior approximations such as INLA. (Joint work with Helen Ogden, Murray Pollock, Gareth Roberts and David Steinsaltz.)

A piecewise deterministic Monte Carlo method for diffusion bridges: Sebastiano Grazzi (Delft U of Technology)

Abstract: The simulation of a diffusion process conditioned to hit a point at a certain time (diffusion bridge) is an essential tool in Bayesian inference of diffusion models with low frequency observations. This has been proven to be a challenging problem, as the transition density of the conditioned process is only known in very special cases. Standard techniques rely on reversible Markov Chain Monte Carlo methods, that propose simpler bridges from which it is possible to sample. These techniques may perform poorly when the diffusion of interest is non-linear. Motivated by this, we explore and apply the Zig-Zag sampler, a rejection-free scheme based on a non-reversible continuous piecewise deterministic Markov process. Starting from the Lévy-Ciesielski construction of a Brownian motion, we expand the infinite dimensional diffusion path in the Faber-Schauder basis. The finite dimensional projection of it, gives an approximated representation of the diffusion process. In this setting, a bridge is simply obtained by fixing the coefficient of the first Faber-Schauder basis function. The Zig-Zag sampler is a flexible scheme able to exploit the conditional independence structure induced by this basis and to explore with different velocities the coefficients of the hierarchical basis. Surprisingly, the sampler does not require the evaluation of the integral appearing in the density function given by the Girsanov’s formula. By its non-reversible nature, it is promising for improving mixing properties of the process. In the poster session, I will explain in detail how the Zig-Zag sampler scheme works for diffusion bridge simulation and show its performance for some challenging diffusion processes. (Joint with Joris Bierkens, Frank van der Meulen, Moritz Schauer.)

Bayesian treed varying coefficient models: Sameer Deshpande (Massachusetts Institute of Technology)

Abstract: The linear varying coefficient model generalizes the conventional linear model by allowing the additive effect of each covariate X on the outcome Y to vary as a function of additional effect modifiers Z. While there are many existing procedures for fitting such a model when the effect modifier Z is a scalar (typically time), there has been comparatively less development for settings with multivariate Z. State-of-the-art methods for this latter setting typically assume either complete knowledge of which components of Z modify which covariate effects or a restrictive additive assumption about the unknown covariate effect functions. These procedures are, prima facie, ill-suited for applications in which we might reasonably suspect covariate effects actually vary systematically with respect to interactions between multiple modifiers. In this work, we present an extension of Bayesian Additive Regression Trees (BART) to the varying coefficient model for such applications that does not impose these strong assumptions. We derive a straightforward Gibbs sampler based on the familiar "Bayesian backfitting" procedure of Chipman, George, and McCulloch (2010) that also allows for correlated residual errors. We further build on recent theoretical advances for the varying-coefficient model and BART to derive posterior concentration rates under our model. (Joint with Ray Bai, Cecilia Balocchi and Jennifer Starling).

Efficient Bayesian estimation of the stochastic volatility model with leverage: Darjus Hosszejni (Vienna U of Economics and Business)

Abstract: The sampling efficiency of MCMC methods in Bayesian inference for stochastic volatility (SV) models is known to highly depend on the actual parameter values, and the effectiveness of samplers based on different parameterizations differs significantly. We derive novel samplers for the centered and the non-centered parameterizations of the practically highly relevant SV model with leverage (SVL), where the return process and the innovations of the volatility process are allowed to correlate. Additionally, based on the idea of ancillarity-sufficiency interweaving, we combine the resulting samplers in order to achieve superior sampling efficiency. The method is implemented using R and C++. (Joint work with Gregor Kastner.)

Scalable Bayesian sparsity-path analysis with the posterior bootstrap: Brieuc Lehmann (U of Oxford)

Abstract: In classical penalised regression, it is common to perform model estimation across a range of regularisation parameter values, typically with the aim of maximising out-of-sample predictive performance. The analogue in the Bayesian paradigm is to place a sparsity-inducing prior on the regression coefficients and explore a range of prior precisions. This, however, can be computationally challenging due to the need to generate a separate posterior distribution for each precision value. Here, we explore the use of the posterior bootstrap to scalably generate a posterior distribution over sparsity-paths. We develop an embarrassingly parallel method that exploits fast algorithms for computing classical regularisation paths and can thus handle large problems. We demonstrate our method on a sparse logistic regression example using genomic data from the UK Biobank. (Joint work with Chris Holmes, Gil McVean, Edwin Fong & Xilin Jiang)

Constrained Bayesian optimization for small area measurement models: Sepideh Mosaferi (Iowa State U)

Abstract: Statistical agencies are often asked to produce small area estimates (SAEs) for skewed vari- ables or those containing outliers. When domain sample sizes are too small to support direct estimators, effects of skewness or outliers of the response variable can be large, and appropriately accounting for the distribution of the response variable given available auxiliary information is important. First, in order to stabilize the skewness and achieve normality in the response variable, we propose an area-level multiplicative log-measurement error model on the response variable, contrasting the proposed additive measurement error model. In addition, we propose a multiplicative measurement error model on the covariates. Second, under our proposed modeling framework, we derive the empirical Bayes predictors (EB) of positive small area quantities sub- ject to the covariates containing measurement error. Third, under our proposed framework, we explore how this methodology can be utilized more generally in SAE by developing constrained estimation methods for small area problems with measurement error. Third, we propose a cor- responding mean squared prediction error using a bootstrap method, where we illustrate that the order of the bias is $O(m^{-1})$, under certain regularity conditions. Finally, we illustrate the performance of our methodology in both model-based simulation and design-based simulation studies, where we comment on the computational complexity of each method. (Joint with Rebecca Steorts.)

Non-uniform subsampling for stochastic gradient MCMC: Srshti Putcha (Lancaster U)

Abstract: Markov chain Monte Carlo (MCMC) scales poorly with dataset size. This is because it requires a full pass through the dataset at least once per iteration. Stochastic gradient MCMC (SGMCMC) offers a scalable alternative to traditional methods, by estimating the gradient of the log-likelihood with a small, uniform subsample of the data at each iteration. While efficient to compute, the resulting gradient estimator may exhibit a relatively high variance. This can adversely affect the convergence of the sampling algorithm to the desired posterior distribution. One way to reduce this variance is to sample the data points from a carefully selected (discrete) non-uniform distribution. The goal of this work is to propose a robust framework to conduct non-uniform subsampling for SGMCMC. To do this, we will draw inspiration from existing methodology proposed in the stochastic optimisation literature. We plan to demonstrate our method on various large-scale applications.

An adaptive scheme for the Zig-Zag sampler: Andrea Bertazzi (Delft U of Technology)

Abstract: The Zig-Zag sampler, introduced in Bierkens et al. (2019), is a Monte Carlo method based on a piecewise deterministic Markov process known as the Zig-Zag process. Empirical experiments have shown that the speed of convergence of this continuous time sampler can be affected by the shape of the target distribution, as for instance in the case of elliptical level curves. This issue can be tackled by tuning the parameters of the process, i.e. the vector of velocities. We then consider linearly transforming the state space and running the standard Zig-Zag sampler on this appropriately transformed space. The optimal transformation matrix can be learned on-the-go while the process explores the state space. This fits in the framework of adaptive Markov chain Monte Carlo algorithms after a suitable discretisation of the time variable. We study the ergodicity of the resulting adaptive Zig-Zag sampler by taking advantage of the existing literature on adaptive algorithms such as Roberts and Rosenthal (2007) and Fort et al. (2011). (Joint work with Joris Bierkens).

Speeding Up the ZigZag Process: Giorgos Vasdekis (U of Warwick)

Abstract: Piecewise Deterministic Markov Processes have recently drawn the attention of the Markov Chain Monte Carlo community. The first reason for this is that, in general, one can simulate exactly the entire path of such a process. The second is that these processes are non-reversible, which sometimes leads to faster mixing. One of the processes used is the ZigZag process, which moves linearly in the space $\mathbb{R}^d$ in specific directions for a random period of time, changing direction one coordinate at a time. An important question related to these samplers is the existence of a Central Limit Theorem which is closely connected to the property of Geometric Ergodicity. It turns out that the ZigZag process is not Geometrically Ergodic when targeting a heavy tail distribution. On this poster we present a way to speed up the ZigZag process to make the algorithm Geometrically Ergodic under heavy tails.

Shrinkage in the time-varying parameter model framework using the R package shrinkTVP: Peter Knaus (Vienna U of Economics and Business)

Abstract: Time-varying parameter (TVP) models are widely used in time series analysis to flexibly deal with processes which gradually change over time. However, the risk of overfitting in TVP models is well known. This issue can be dealt with using appropriate global-local shrinkage priors, which pull time-varying parameters towards static ones. In this paper, we introduce the R package shrinkTVP, which provides a fully Bayesian implementation of shrinkage priors for TVP models, taking advantage of recent developments in the literature, in particular that of Bitto and Frühwirth-Schnatter (2019). The package shrinkTVP allows for posterior simulation of the parameters through an efficient Markov Chain Monte Carlo scheme. Moreover, summary and visualization methods, as well as the possibility of assessing predictive performance through log predictive density scores, are provided. The computationally intensive tasks have been implemented in C++ and interfaced with R. The paper includes a brief overview of the models and shrinkage priors implemented in the package. Furthermore, core functionalities are illustrated, both with simulated and real data. (Joint with Angela Bitto-Nemling, Annalisa Cadonna, and Sylvia Frühwirth-Schnatter.)

Distributed Bayesian computation for model choice: Alexander Buchholz (U of Cambridge)

Abstract: We propose a general method for distributed Bayesian model choice, where each worker has access only to non-overlapping subsets of the data. Our approach ap- proximates the model evidence for the full dataset through Monte Carlo sampling from the posterior on every subset which is produced by any suitable method to return an estimator for the evidence. The model evidences per worker are then consistently combined using a novel approach which corrects for the splitting using summary statistics of the generated samples. This divide-and-conquer approach allows Bayesian model choice in the large data setting, exploiting all available in- formation but limiting communication between workers. Our work thereby comple- ments the work on consensus Monte Carlo (Scott et al., 2016) by explicitly enabling model choice. In addition, we show how the suggested approach can be extended to model choice within a reversible jump setting that explores multiple models within one run. (Joint with D. Ahfock and S. Richardson)

Hamiltonian Monte Carlo with boundary reflections, and application to polytope volume calculations: Augustin Chevallier (Lancaster U)

Abstract: This poster presents a study of HMC with reflections on the boundary of a domain, providing an enhanced alternative to Hit-and-run (HAR) to sample a target distribution in a bounded domain. We make three contributions. First, we provide a convergence bound, paving the way to more precise mixing time analysis. Second, we present a robust implementation based on multi-precision arithmetic – a mandatory ingredient to guarantee exact predicates and robust constructions. Third, we use our HMC random walk to perform polytope volume calculations, using it as an alternative to HAR within the volume algorithm by Cousins and Vempala. The tests, conducted up to dimension 50, show that the HMC RW outperforms HAR. (Joint work with Frederic Cazals and Sylvain Pion)

Developments in Stein-based control variates: Leah South (Lancaster U)

Abstract: Stein’s method has recently been used to generate control variates which can improve Monte Carlo estimators of expectations when the derivatives of the log target are available. The two most popular Stein-based variance reduction techniques are zero-variance control variates (ZV-CV, a parametric approach) and control functionals (CF, a non-parametric alternative). This poster will describe two recent developments in this area. The first method applies regularisation methods in ZV-CV to give reduced-variance estimators in high-dimensional Monte Carlo integration (South, Oates, Mira, & Drovandi, 2018). A novel kernel-based method motivated by CF and by Sard's method for numerical integration will also be introduced. This kernel-based approach allows for misspecification in the ZV-CV regression problem, and represents a balance between ZV-CV and CF when the number of samples is sufficiently large. The benefits of the proposed variance reduction techniques will be illustrated using several Bayesian inference examples. (Joint with Chris Oates, Chris Nemeth, Toni Karvonen, Antonietta Mira, Mark Girolami and Chris Drovandi.)

Hug and Hop: A discrete-time, non-reversible Markov chain Monte Carlo algorithm: Matthew Ludkin (Lancaster U)

Abstract: We introduced the Hug and Hop Markov chain Monte Carlo algorithm for estimating expectations with respect to an intractable distribution $\pi$. The algorithm alternates between two kernels: Hug and Hop. Hug is a non-reversible kernel that uses repeated applications of the bounce mechanism from the recently proposed Bouncy Particle Sampler to produce a proposal point far from the current position, yet on almost the same contour of the target density, leading to a high acceptance probability. Hug is complemented by Hop, which deliberately proposes jumps between contours and has an efficiency that degrades very slowly with increasing dimension. There are many parallels between Hug and Hamiltonian Monte Carlo (HMC) using a leapfrog intergator, including an $\mathcal{O}{\delta^2}$ error in the integration scheme, however Hug is also able to make use of local Hessian information without requiring implicit numerical integration steps, improving efficiency when the gains in mixing outweigh the additional computational costs. We test Hug and Hop empirically on a variety of toy targets and real statistical models and find that it can, and often does, outperform HMC on the exploration of components of the target.

Hierarchical variance shrinkage through the triple gamma prior: Annalisa Cadonna (Vienna U of Economics and Business)

Abstract: Time-varying parameter (TVP) models are very flexible in capturing gradual changes in the effect of a predictor on the outcome variable. However, in particular when the number of predictors is large, there is a known risk of overfitting and poor predictive performance, since the effect of some predictors is constant over time. In the present work, a triple gamma prior is proposed for variance shrinkage in TVP models. The triple gamma prior encompasses a number of priors that have been suggested previously, such as the Bayesian Lasso, the double gamma prior and the Horseshoe prior. Interesting properties of the triple gamma prior are outlined and an efficient Markov Chain Monte Carlo algorithm is developed. An extended simulation study is conducted and the proposed modeling approach is applied to real data, both in a univariate and a multivariate framework. The predictive performance of different shrinkage priors is compared in terms of log predictive density scores. (Joint with Peter Knaus & Sylvia Frühwirth-Schnatter.)

Scaling Bayesian probabilistic record linkage with post-hoc blocking: An application to the California Great Registers: Brendan McVeigh (Carnegie Mellon U)

Abstract: Probabilistic record linkage (PRL) is the process of determining which records in two databases correspond to the same underlying entity in the absence of a unique identifier. Bayesian solutions to this problem provide a powerful mechanism for propagating uncertainty due to uncertain links between records (via the posterior distribution over linkage structures). However, computational considerations severely limit the practical applicability of existing Bayesian methods. We propose a new computational approach that dramatically improves scalability of posterior inference, scaling Bayesian inference to problems orders of magnitude larger than state-of-the-art algorithms. We demonstrate our method on a subset of an OCR'd dataset, the California Great Registers, containing hundreds of thousands of voter registrations. Despite lacking a high quality blocking key our approach allows a posterior distribution to be estimated on a single machine in a matter of hours. Our advances make it possible to perform Bayesian PRL for larger problems, and to assess the sensitivity of results to varying model specifications.

Finite mixtures are typically inconsistent for the number of components: Diana Cai (Princeton U)

Abstract: Scientists and engineers are often interested in learning the number of subpopulations (or clusters) present in a data set. It is common to use a Dirichlet process mixture model (DPMM) for this purpose. But Miller and Harrison (2013) warn that the DPMM posterior is severely inconsistent for the number of clusters when the data are truly generated from a finite mixture; that is, the posterior probability of the true number of clusters goes to zero in the limit of infinite data. A potential alternative is to use a finite mixture model (FMM) with a prior on the number of clusters. Past work has shown the resulting posterior in this case is consistent. But these results crucially depend on the assumption that the cluster likelihoods are perfectly specified. In practice, this assumption is unrealistic, and empirical evidence (Miller and Dunson, 2018) suggests that the posterior on the number of clusters is sensitive to the likelihood choice. In this paper, we prove that under even the slightest model misspecification, the FMM posterior on the number of components is also severely inconsistent. We support our theory with empirical results on simulated and real data sets. (Joint work with Trevor Campbell and Tamara Broderick.)

Entity resolution, canonicalization, and the downstream task: Kelsey Lieberman (Duke U)

Abstract: Entity resolution (ER) is the process of merging noisy databases to remove duplicate entities, often in the absence of a unique identifier. ER can be thought of a data cleansing task, where most analysts are most concerned about the downstream tasks of inference/prediction. Crucial to this is understanding the uncertainty of errors at each stage in the pipeline, such that these are appropriately passed into the downstream task. In this talk, we consider a three stage pipeline. First, we consider the ER task, which could be probabilistic or Bayesian. Specifically, we propose a Bayesian graphical model that incorporates training data into the model directly such that the sampler can make faster updates when applied to very large datasets. Second, we propose new methodology for selecting the most representative record from the output of the ER task, which is known as canonicalization. Third, we consider the prediction task of linear and logistic regression on experiments making comparisons to benchmarks in the literature. Finally, we will give a discussion of the proposed work and future directions. (Joint with Rebecca Steorts).

Genomic variety estimation via Bayesian nonparametrics: Lorenzo Masoero (Massachusetts Institute of Technology)

Abstract: The exponential growth in size of human genomic studies, with tens of thousands of observations, opens up the intriguing possibility to investigate the role of rare genetic variants in biological human evolution. A better understanding of rare genetic variants is crucial for the study of rare genetic diseases, as well for personalized medicine. A crucial challenge when working with rare variants, is to develop a statistical framework to assess if the observed sample is truly representative of the underlying population. In particular, it is important to understand (i) what fraction of the relevant variation present in human genome is not yet captured by available datasets and (ii) how to design future experiments in order to maximize the number of hitherto unseen genomic variants. We propose a novel rigorous methodology to address both problems using a nonparametric Bayesian framework. Our contribution is twofold: first,we provide an estimator for the number hitherto unseen variants which are going to be observed when additional samples from the same population are collected and study its theoretical and empirical properties. Moreover, we show how this approach can be used in the context of the optimal design of genomic studies. For this problem, under a fixed budget, one is interested in maximizing the number of genomic discoveries by optimally enlarging a dataset, trading off between the additional number of individuals to be sequenced and the quality of the individual samples. (Joint work with Stefano Favaro, Federico Camerlenghi and Tamara Broderick.)

Spike and slab priors for undirected Gaussian graphical model selection: Jack Carter (U of Warwick)

Abstract: We introduce a class of prior distributions on the precision matrix of a Gaussian random variable. Priors in this class involve a spike and slab density being set on the partial correlations; this induces sparsity on the related undirected graphical model and aids computational efficiency by leading to an EM algorithm and posterior Gibbs sampler that are easy to derive. We pay particular attention to the use of a non-local MOM density being used for the slab which better represents the hypothesis that the partial correlation is 0 by having zero density at 0. For this we suggest default parameter values which ensure interpretability of the prior and control the threshold on the partial correlations at which we include an edge in the model. This has links to causality by ensuring that any edge included in the graphical model is of a certain specified strength - one of the Bradford Hill criteria. We also discuss the computational aspects related to posterior inference. The use of a spike and slab prior removes the need for a model search algorithm over the space of undirected graphical models, however direct inference on the model is not possible. We propose a number of methods involving an EM algorithm and Gibbs sampling to make posterior inference. (Joint work with David Rossell and Jim Smith).

Bayesian analysis for hierarchical models using piecewise-deterministic Markov process: Matthew Sutton (Lancaster U)

Abstract: Piecewise-deterministic Markov process (PDMPs) are an emerging class of Markov chain Monte Carlo methods for efficient sampling of complex targets. In practice, sampling from a PDMP involves simulating from a non-homogeneous Poisson process. This non-trivial task is usually accomplished through thinning which requires simulating from an upper-bound on the Poisson rate. The efficiency of the sampler is effected by how tight this upper bound is (statistical efficiency) and how quickly it can be simulated (computational efficiency). In this work, we explore the efficiency of PDMPs for sampling in a popular class of Bayesian inference models. Specifically, we focus on latent Gaussian models where there is a non-Gaussian response and the latent field is a Gaussian distribution controlled by a few hyper-parameters. We take advantage of the sparsity in the precision of the Gaussian field to ensure computational efficiency and derive tight upper-bounds for the thinning in these models. Finally, we measure the potential of these methods alongside alternatives such as Hamiltonian Monte Carlo and the Metropolis-adjusted Langevin Algorithm.

A Bayesian nonparametric test for conditional independence: Onur Teymur (Imperial College London)

Abstract: We present a Bayesian nonparametric method for assessing the dependence or independence of two variables conditional on a third. Our approach uses Polya tree priors on spaces of conditional probability distributions; these random measures are then embedded within a decision-theoretic test for conditional (in)dependence. The setup supports the testing of large datasets while relaxing the linearity assumptions central to classical approaches such as partial correlation. In fact, no assumption whatsoever is made on the form of dependence between the variables. The test is fully Bayesian, meaning both hypotheses can be positively evidenced—this feature is particularly useful for causal discovery and is not employed by any previous procedure of this type.

Efficient stochastic optimisation by unadjusted Langevin Monte Carlo. Application to maximum marginal likelihood and empirical Bayesian estimation: Valentin De Bortoli (ENS Paris Saclay)

Abstract: Stochastic approximation methods play a central role in maximum likelihood estimation problems involving intractable likelihood functions, such as marginal likelihoods arising in problems with missing or incomplete data, and in parametric empirical Bayesian estimation. Combined with Markov chain Monte Carlo algorithms, these stochastic optimisation methods have been successfully applied to a wide range of problems in science and industry. However, this strategy scales poorly to large problems because of methodological and theoretical difficulties related to using high-dimensional Markov chain Monte Carlo algorithms within a stochastic approximation scheme. This paper proposes to address these difficulties by using unadjusted Langevin algorithms to construct the stochastic approximation. This leads to a highly efficient stochastic optimisation methodology with favourable convergence properties that can be quantified explicitly and easily checked. The proposed methodology is demonstrated with three experiments, including a challenging application to high-dimensional statistical audio analysis and a sparse Bayesian logistic regression with random effects problem. (Joint work with Alain Durmus, Marcelo Pereyra and Ana Fernandez Vidal.)

Noise contrastive estimation: Asymptotic properties, formal comparison with MC-MLE: Lionel Riou-Durand (U of Warwick)

Abstract: A statistical model is said to be un-normalised when its likelihood function involves an intractable normalising constant. Two popular methods for parameter inference for these models are MC-MLE (Monte Carlo maximum likelihood estimation), and NCE (noise contrastive estimation); both methods rely on simulating artificial data-points to approximate the normalising constant. While the asymptotics of MC-MLE have been established under general hypotheses (Geyer, 1994), this is not so for NCE. We establish consistency and asymptotic normality of NCE estimators under mild assumptions. We compare NCE and MC-MLE under several asymptotic regimes. In particular, we show that, when $m \rightarrow \infty$ while $n$ is fixed ($m$ and $n$ being respectively the number of artificial data-points, and actual data-points), the two estimators are asymptotically equivalent. Conversely, we prove that, when the artificial data-points are IID, and when $n \rightarrow \infty$ while $m/n$ converges to a positive constant, the asymptotic variance of a NCE estimator is always smaller than the asymptotic variance of the corresponding MC-MLE estimator. We illustrate the variance reduction brought by NCE through a numerical study. (Joint with N. Chopin.)

Alternative tests for financial risk model validation: Elena Goldman (Pace U)

Abstract: The current practice of financial risk management is to evaluate models based on ex-post outcome: backtesting. For example, finan- cial institutions need to evaluate Value at Risk (VaR) estimates for setting bank’s economic capital or initial margins for clearing agencies (CCP’s). The advantage of Bayesian methods is the ability to obtain full posterior distribution of risk measures compared to simple point esti- mates. Furthemore, Bayesian approach can produce predictive scores and posterior pdfs of loss functions. For CCP’s margins we introduce a method based on the distribution of model loss function that cap- tures the trade off of margin shortfall and procyclicality. We show how this loss function performs for various risk models in measuring tail risk. We perform a test of model selection using the posterior dis- tribution of the differences between CDFs of loss measures introduced in Goldman et al (2013). (Joint with Xiangjin Shen.)

The $f$-Divergence Expectation Iteration scheme: Kamelia Daudel (Institut Polytechnique de Paris)

Abstract: We introduce the $f$-EI$(\phi)$ algorithm, a novel iterative algorithm which operates on measures and performs $f$-divergence minimisation in a Bayesian framework. We prove that for a rich family of values of $(f,\phi)$ this algorithm leads at each step to a systematic decrease in the $f$-divergence and show that we achieve an optimum. In the particular case where we consider a weighted sum of Dirac measures and the α-divergence, we obtain that the calculations involved in the $f$-EI$(\phi)$ algorithm simplify to gradient-based computations. Empirical results support the claim that the $f$-EI$(\phi)$ algorithm serves as a powerful tool to assist Variational methods. (Joint with Randal Douc, Francois Portier and Francois Roueff.)

Scalable approximate inference for state space models with normalising flows: Tom Ryder (Newcastle U)

Abstract: By exploiting mini-batch stochastic gradient optimisation, variational inference has had great success in scaling up approximate Bayesian inference to big data. To date, however, this strategy has only been applicable to models of independent data. Here we extend mini-batch variational methods to state space models of time series data. To do so we introduce a novel generative model as our variational approximation, a local inverse autoregressive flow. This allows a subsequence to be sampled without sampling the entire distribution. Hence we can perform training iterations using short portions of the time series at low computational cost. We illustrate our method on AR (1), Lotka-Volterra and FitzHugh-Nagumo models, achieving accurate parameter estimation in a short time.

Record Linakge and Time Series Regression: Shubhi Sharma (Duke U)

Abstract: Entity resolution (ER) (record linkage or de-duplication) is the process of merging noisy datasets and removing duplicate entries, often in the absence of a unique identifier for records. We propose a novel unsu- pervised approach for linking records across arbitrarily many files, while simultaneously detecting duplicate records within files, in a temporal setting. We leverage existing work in the literature such that we can represent patterns of links between records as a bipartite graph, in which records are directly linked to latent true individuals, and only indirectly linked to other records. We propose a an efficient, linear-time, hybrid Markov chain Monte Carlo algorithm, which overcomes many obstacles, such as low acceptance rates, encountered by previously proposed methods of ER. We assess our results on real and simulated data. (Joint with Jairo Fuquene and Rebecca Steorts.)

Exploring alternative likelihoods and priors in geographic profiling with the R package Silverblaze: Michael Stevens (Queen Mary U of London)

Abstract: Geographic profiling (GP), a model originally developed in criminology in cases of serial crime such as rape, murder and arson, is used to prioritise large lists of suspects associated with a set of linked crimes. GP is used to identify likely areas containing an offender’s home or workplace given where they’ve committed their crimes. GP now boasts a range of applications in ecology and epidemiology; finding nesting locations of invasive species or areas associated with the outbreak of an infectious disease. The model spatially clusters point pattern data in geographical space using a Dirichlet Process Mixture (DPM) model. Despite the wide success of this model the DPMM relies on a Gibbs sampling algorithm (Neal 2000, Verity et al. 2014) that restricts the user to specific forms for the likelihood and prior given the conjugacy between them. Part of my work revolves around introducing a Metropolis-Hastings algorithm alongside a Gibbs sampler to GP. This poster will describe the development of Silverblaze, an R package for running the GP model using different kinds of likelihoods and priors specified by the user. A large proportion of publications following on from Verity et al. (2014) consistently fit a mixture of normal distributions to the data. Silverblaze is the first instance in GP a user can specify different decaying distributions as well as inferring under a Poisson model, where a user may have collected count data in place of point pattern data, to estimate additional parameters of interest such as population density.

Transport Monte Carlo: Leo Duan (U of Florida)

Abstract: In Bayesian posterior estimation, the transport map finds a deterministic transform from a simple reference distribution to a potentially complicated posterior distribution. Compared to other sampling approaches, it is capable of generating independent samples while exploiting efficient optimization toolboxes. However, a fundamental concern is that the invertible map is challenging to parameterize with sufficient flexibility, and may even fail to exist between the two distributions. To address this issue, we propose Transport Monte Carlo, which models the transform as a random choice from multiple maps. It corresponds to a coupling distribution of the reference and posterior, which is guaranteed to exist under mild conditions. This framework allows us to decompose a sophisticated transform into multiple components; each is now simple to parameterize and estimate. In the meantime, it enjoys a direct extension to coupling a continuous reference and a discrete posterior. We examine its theoretical properties, including the error rate due to the finite training sample size. Compared to existing methods such as Hamiltonian Monte Carlo or neural network-based transport map, our method demonstrates much-improved performances in several common sampling problems, including the multi-modal distribution, high-dimensional sparse regression, and combinatorial sampling of the graph edges.

Removing the mini-batching bias for large scale Bayesian inference: Inass Sekkat (Ecole des Ponts)

Abstract: The computational cost of usual Monte Carlo methods for sampling a posteriori laws in Bayesian inference scales linearly with the number of data points, which becomes prohibitive in the big data context. One option is to resort to mini-batching in conjunction with unadjusted discretizations of Langevin dynamics, in which case only a random fraction of the data is used to estimate the gradient. However, this leads to an additional noise in the dynamics and hence a bias on the invariant measure which is sampled by the Markov chain. We advocate using the so-called Adaptive Langevin dynamics, which is a modification of standard inertial Langevin dynamics with a dynamical friction which automatically corrects for the increased noise arising from mini-batching. We investigate in particular the practical relevance of the assumptions underpinning Adaptive Langevin (constant covariance for the estimation of the gradient), which are not satisfied in typical models of Bayesian inference; and discuss how to extend the approach to more general situations.

Optimal scaling of MALA with Laplace distribution as a target: Pablo Jimenez (Ecole Polytechnique)

Abstract: This paper considers the optimal scaling problem for Metropolis adjusted approximations of Langevin dynamics for the Laplace distribution. We obtain, similarly to the results established in [Roberts, Rosenthal 1997], and under the same setting - independent and identically distributed models and at stationarity - the convergence of the first component of the corresponding Markov chain, rescaled in time and space, to a Langevin diffusion process as the dimension d goes to infinity. However, maybe surprisingly, the optimal scaling obtained with respect to the dimension d is 2/3, which is therefore different from the one holding for smooth distributions. As a result, we obtain a new optimal acceptance rate, approximatively 0.360.

Bayesian nonparametric models for graph structured data: Florence Forbes (U of Grenoble, INRIA)

Abstract: We consider the issue of determining the structure of clustered data, both in terms of finding the appropriate number of clusters and of modelling the right dependence structure between the observations. Bayesian nonparametric (BNP) models, which do not impose an upper limit on the number of clusters, are appropriate to avoid the required guess on the number of clusters but have been mainly developed for independent data. In contrast, Markov random fields (MRF) have been extensively used to model dependencies in a tractable manner but usually reduce to finite cluster numbers when clustering tasks are addressed. Our main contribution is to propose a general scheme to design tractable BNP-MRF priors that combine both features: no commitment to an arbitrary number of clusters and a dependence modelling. A key ingredient in this construction is the availability of a stick-breaking representation which has the threefold advantage to allowing us to extend standard discrete MRFs to infinite state space, to design a tractable estimation algorithm using variational approximation and to derive theoretical properties on the predictive distribution and the number of clusters of the proposed model. This approach is illustrated on a challenging natural image segmentation task for which it shows good performance with respect to the literature.

Overfitted mixture models to learn the number of chains in the Factorial Hidden Markov Models: Applications to stochastic volatility modeling: Jan Greve (Vienna U of Economics and Business)

Abstract: When dealing with moderate to high-dimensional Markov switching models, factorial hidden Markov models (FHMMs) present a more parsimonious alternative to the traditional hidden Markov models (HMMs). The more restric- tive representation of the overall state space in a distributed manner is especially useful when the state space in question is relatively large, a case in which most HMM based approaches would encounter computational difficulties when resolving label-switches, a typical issue of combinatorial complexity. For the FHMM, which usually restricts the number of states within each distributed Markov chain to be equal, the only model parameter is the number of latent chains that determines the overall size of the state space. We rephrase this model selection problem into the overfitted mixture framework, where the number of latent chains are set to be much larger than the true value and we would like our sampler to learn the number of effective chains. This is achieved through the combination of component-wise shrinkage within each chain and shrinkage applied to the distribution of the persistence probabilities. In this way, it is possible to make redundant chains to be ”inactive”, in a sense that it does not contribute to the likelihood nor to the entropy of the joint transition matrix which is constructed by taking a tensor product of transition matrices within each chain. Finally, the overall framework of this paper will be demonstrated through the application to the stochastic volatility models. (Joint work with Sylvia Frühwirth-Schnatter.)

LR-GLM: High-dimensional Bayesian inference using low-rank data approximations: Brian Trippe (Massachusetts Institute of Technology)

Abstract: Due to the ease of modern data collection, applied statisticians often have access to a large set of covariates that they wish to relate to some observed outcome. Generalized linear models (GLMs) offer a particularly interpretable framework for such an analysis. In these high-dimensional problems, the number of covariates is often large relative to the number of observations, so we face non-trivial inferential uncertainty; a Bayesian approach allows coherent quantification of this uncertainty. Unfortunately, existing methods for Bayesian inference in GLMs require running times roughly cubic in parameter dimension, and so are limited to settings with at most tens of thousand parameters. We propose to reduce time and memory costs with a low-rank approximation of the data in an approach we call LR-GLM. When used with the Laplace approximation or Markov chain Monte Carlo, LR-GLM provides a full Bayesian posterior approximation and admits running times reduced by a full factor of the parameter dimension. We rigorously establish the quality of our approximation and show how the choice of rank allows a tunable computational-statistical trade-off. Experiments support our theory and demonstrate the efficacy of LR-GLM on real large-scale datasets.

Scalable inference for agent-based models: Nianqiao Ju (Harvard U)

Abstract: Agent-based models (ABMs) represent systems at the level of their constituent units, because in many complex dynamic networks, macro-level phenomena arise from micro-level behaviors. For example, in a susceptible-infected-recovered (SIR) stochastic agent-based model for infectious diseases, agents interact in a (possibly dynamic) network by infecting each other and perform individual-level actions such as birth, death, and recovery. We consider the task of learning ABMs, which can be viewed as a statistical inference task in a large-dimensional hidden Markov model. In this poster, we focus on approximating the marginal likelihood function for the SIR disease process where we observe only a fraction of the total number of infections. We develop two data augmentation schemes that lead to the auxiliary particle filter (APF) proposal, and we present their connections to the Poisson-Binomial and Conditional-Binomial distributions. With informative population observations, the APF avoids particle degeneracy of the bootstrap particle filter. (Joint work with Pierre Jacob and Jeremy Heng).

A Gibbs-like integrator for Hamiltonian Monte Carlo: Melissa Malcom (Rutgers U)

Abstract: This poster compares the convergence of Gibbs and HMC for Bayesian hierarchical models. The Hamiltonian dynamics in HMC is approximated by a Gibbs-like symplectic integrator adapted to the structure of hierarchical models. This integrator allows larger time step sizes than Verlet, which in turn, accelerates convergence of HMC.

Estimating learning coefficients for model evaluation using MCMC simulations: Toru Imai (Kyoto U)

Abstract: Evaluation of the marginal likelihood of singular models such as deep learning is a challenging task. The singular Bayesian information criterion (sBIC) gives the state-of-the-art approximation to the log marginal likelihood, which can be applied to both regular and singular models. However, sBIC requires the theoretical values of the learning coefficients, but only few learning coefficients are known. In this presentation, we propose a new estimator of the learning coefficients using MCMC simulations.

A method to estimate state space model by spatiotemporal continuity: Tsuyoshi Ishizone (Meiji U)

Abstract: Model estimation from time series and/or spatio-temporal data is important topic since it helps us to extract useful information from big data in recent years. In this poster, we introduce an estimation algorithm of the linear Gaussian state space model with focusing on the real-time property. Our algorithm is quicker than and as accurate as existing methods, therefore, it suffices the requirement of the rapid response for the alternation of the fields. Moreover, we introduce localization and spatial uniformity into the algorithm to reduce the number of the parameters. Thanks to this, we obtain stable method to estimate parameters regarding state transition and states.

Stochastic scale mixture modeling in Bayesian longitudinal data analysis: Anish Mukherjee (Case Western Reserve U)

Abstract: A variety of methods for generalizing standard mixed model to explain time correlated response structure have been developed in the context of longitudinal data analysis. Using Gaussian process (GP) to specify an AR structure is a typical way to introduce within subject serial correlation into the model. In order to generalize, Quintana et al 2016 proposed a Dirichlet Process Mixture (DPM) over the covariance parameters of the GP. Here we propose a scale mixture of GPs as an alternative approach that allows for modeling heterogeneous covariance structure in a more flexible way. Different mixing distributions and the associated shrinkage behavior can be utilized to explain different covariance structures present in the data. We discuss the computational challenges associated with our approach and report promising estimation and prediction performances as compared to the DPM based method in different simulation setups and in a real data example.

Finite-sample correction for estimators of real log canonical threshold based on Markov chain Monte Carlo: Shiro Tanaka (Kyoto U)

Abstract: In singular model selection problems that involve models whose Fisher information matrices may fail to be invertible, the penalty structure in deviance information criteria or Schwarz's Bayesian information criteria (BIC) does not reflect the theoretical large sample behavior under the regularity conditions. Drton and Plummer (2017) presented an extension of BIC, singular BIC, based on a large sample approximation of the marginal likelihood that involves a constant called as real log canonical threshold (Watanabe 2009). Real log canonical threshold is determined by the algebraic geometrical structure of the statistical model and prior distribution and is generally unknown. In this work, we consider generic methods for computing real log canonical threshold and singular Bayesian information criteria based on Markov-chain Monte Carlo. Simulation experiments suggested that an estimator with finite-sample correction outperforms other estimators under normal distribution, normal linear models, logistic regression, and normal mixture. (Joint work with Toru Imai.)

Estimating convergence of Markov chains with L-lag couplings: Niloy Biswas (Harvard U)

Abstract: Markov chain Monte Carlo (MCMC) methods generate samples that are asymptotically distributed from a target distribution of interest as the number of iterations goes to infinity. Various theoretical results provide upper bounds on the distance between the target and marginal distribution after a fixed number of iterations. These upper bounds are on a case by case basis and typically involve intractable quantities, which limits their use for practitioners. We introduce L-lag couplings to generate computable, non-asymptotic upper bound estimates for the total variation or the Wasserstein distance of general Markov chains. We apply L-lag couplings to the tasks of (i) determining MCMC burn-in, (ii) comparing different MCMC algorithms with the same target, and (iii) comparing exact and approximate MCMC. Lastly, we (iv) assess the bias of sequential Monte Carlo and self-normalized importance samplers.

Bayesian adaptive sequential design: Dinko Franceschi (Columbia U)

Abstract: We introduce a novel model on how to do better decision making, going beyond the current approach of Bandit or A/B testing in clinical trials. We provide a better bridge between existing literature and real world applications by creating a more realistic framework for clinical drug trials. This method integrates adaptive testing, includes estimated costs and benefits into the decisions, and considers a stream of innovations rather than treating decisions one at a time. We present a Bayesian model which shows the value of the proposed framework in the medical treatment world and offer experimental results demonstrating its performance over traditional non-Bayesian approaches.

A divide and conquer algorithm of Bayesian density estimation: Ya Su (U of Kentucky)

Abstract: Data sets for statistical analysis become extremely large even with some difficulty of being stored on one single machine. Even when the data can be stored in one machine, the computational cost would still be intimidating. We propose a divide and conquer solution to density estimation using Bayesian mixture modeling including the infinite mixture case. The methodology can be generalized to other application problems where a Bayesian mixture model is adopted. The proposed prior on each machine or subsample modifies the original prior on both mixing probabilities as well as on the rest of parameters in the distributions being mixed. The ultimate estimator is obtained by taking the average of the posterior samples corresponding to the proposed prior on each subset. Despite the tremendous reduction in time thanks to data splitting, the posterior contraction rate of the proposed estimator stays the same (up to a log factor) as that of the original prior when the data is analyzed as a whole. Simulation studies also justify the competency of the proposed method compared to the established WASP estimator in the finite dimension case. In addition, one of our simulations is performed in a shape constrained deconvolution context and reveals promising results. The application to a GWAS data set reveals the advantage over a naive method that uses the original prior.

Efficient Bayesian synthetic likelihood with whitening transformations: Chris Drovandi (Queensland U of Technology)

Abstract: Likelihood-free methods are an established approach for performing approximate Bayesian inference for models with intractable likelihood functions. However, they can be computationally demanding. Bayesian synthetic likelihood (BSL) is a popular such method that approximates the likelihood function of the summary statistic with a known, tractable distribution -- typically Gaussian -- and then performs statistical inference using standard likelihood-based techniques. However, as the number of summary statistics grows, the number of model simulations required to accurately estimate the covariance matrix for this likelihood rapidly increases. This poses a significant challenge for the application of BSL, especially in cases where model simulation is expensive. Here we propose whitening BSL (wBSL) -- an efficient BSL method that uses approximate whitening transformations to decorrelate the summary statistics at each algorithm iteration. We show empirically that this can reduce the number of model simulations required to implement BSL by more than an order of magnitude, without much loss of accuracy. (Joint work with Jacob Priddle and Scott Sisson.)

About Bayes Comp

Where & When

Deadlines

Registration

Fees (in US$)

Click here to register!

Travel Support

Accommodation

Here is the program

Plenary Speakers

Invited Sessions

Contributed Sessions

Short Courses/Tutorials/Practice Labs

Info for poster presenters:

Posters

Committees

Scientific

Organizing

Bayes Comp 2020 Sponsors

Code of Conduct