Bayes Comp is a biennial conference sponsored by the ISBA section of the same name. The conference and the section both aim to promote original research into computational methods for inference and decision making and to encourage the use of frontier computational tools among practitioners, the development of adapted software, languages, platforms, and dedicated machines, and to translate and disseminate methods developed in other disciplines among statisticians.

Bayes Comp is the current incarnation of the popular MCMSki series of conferences, and Bayes Comp 2020 is the second edition of this new conference series. The first edition was Bayes Comp 2018, which was held in Barcelona in March of 2018.

Where & When

Bayes Comp 2020 will take place in the Reitz Union at the University of Florida. It will start in the afternoon on Tuesday, January 7 (2020) and finish in the afternoon on Friday, January 10.

• Deadline for submission of poster proposals: December 15, 2019.

• Provide the name and affiliation of the speaker, as well as a title and an abstract for the poster. If the poster is associated with a technical report or publication, please also provide that information. Acceptance is conditional on registration, and decisions will be made on-the-fly, usually within a week of submission. Email your proposal to Christian Robert.

• Deadline for applications for travel support: September 20, 2019. (Scroll down for details.)

Travel Support

There are funds available for junior travel support. These funds are earmarked for people who are either currently enrolled in a PhD program, or have earned a PhD within the last three years (no earlier than January 1, 2017). To be eligible for funding, you must be presenting (talk or poster), and be registered for the conference.

Applicants should email the following two items to Jim Hobert: (1) An up-to-date CV, and (2) proof of current enrollment in a PhD program (in the form of a short letter from PhD advisor), or a PhD certificate featuring the graduation date. The application deadline is September 20, 2019.

Accommodation

Blocks of rooms have been reserved at three different hotels:

Program

Invited Sessions

• Theory & practice of HMC (and its variants) for Bayesian hierarchical models : Tamara Broderick (MIT), George Deligiannidis (U of Oxford), Aaron Smith (U of Ottawa).
• Tamara Broderick: The kernel interaction trick: Fast Bayesian discovery of multi-way interactions in high dimensions

Abstract: Discovering interaction effects on a response of interest is a fundamental problem faced in biology, medicine, economics, and many other scientific disciplines. In theory, Bayesian methods for discovering pairwise interactions enjoy many benefits such as coherent uncertainty quantification, the ability to incorporate background knowledge, and desirable shrinkage properties. In practice, however, Bayesian methods are often computationally intractable for even moderate-dimensional problems. Our key insight is that many hierarchical models of practical interest admit a particular Gaussian process (GP) representation; the GP allows us to capture the posterior with a vector of $O(p)$ kernel hyper-parameters rather than $O(p^2)$ interactions and main effects. With the implicit representation, we can run Markov chain Monte Carlo (MCMC) over model hyper-parameters in time and memory linear in p per iteration. We focus on sparsity-inducing models and show on datasets with a variety of covariate behaviors that our method: (1) reduces runtime by orders of magnitude over naive applications of MCMC, (2) provides lower Type I and Type II error relative to state-of-the-art LASSO-based approaches, and (3) offers improved computational scaling in high dimensions relative to existing Bayesian and LASSO-based approaches.

• George Deligiannidis: The bouncy particle sampler and randomized Hamiltonian Monte Carlo

Abstract: TBA

• Aaron Smith: Free lunches and subsampling Monte Carlo

Abstract: It is widely known that the performance of MCMC algorithms can degrade quite quickly when targeting computationally expensive posterior distributions, including the posteriors associated with any large dataset. This has motivated the search for MCMC variants that scale well for large datasets. One general approach, taken by several research groups, has been to look at only a subsample of the data at every step. In this talk, we focus on a simple "no-free-lunch" results which provide some basic limits on the performance of many such algorithms. We apply these generic results to realistic statistical problems and proposed algorithms, and also discuss some special examples that can avoid our generic results and provide a free (or at least cheap) lunch. (Joint with Patrick Conrad, Andrew Davis, James Johndrow, Youssef Marzouk, Natesh Pillai, and Pengfei Wang.)

• Scalable methods for high-dimensional problems : Akihiko Nishimura (UCLA), Anirban Bhattacharya (Texas A&M), Sara Wade (U of Edinburgh).
• Akihiko Nishimura: Scalable Bayesian sparse generalized linear models and survival analysis via curvature-adaptive Hamiltonian Monte Carlo for high-dimensional log-concave distributions

Abstract: Bayesian sparse regression based on shrinkage priors possess many desirable theoretical properties and yield posterior distributions whose conditionals mostly admit straightforward Gibbs updates. Sampling high-dimensional regression coefficients from its conditional distribution, however, presents a major scalability issue in posterior computation. The conditional distribution generally does not belong to a parametric family and the existing sampling approaches are hopelessly inefficient in high-dimensional settings. Inspired by recent advances in understanding the performance of Hamiltonian Monte Carlo (HMC) on log-concave target distributions, we develop *curvature-adaptive HMC* for scalable posterior inference under sparse regression models with log-concave likelihoods. As is well-known, HMC's performance critically depends on the integrator stepsize and mass matrix. These tuning parameters are typically adjusted over many HMC iterations by collecting statistics on the target distribution --- an impractical approach when employing HMC within a Gibbs sampler since the conditional distribution changes as the other parameters are updated. Instead, we achieve on-the-fly calibration of the key HMC tuning parameters through 1) the recently developed theory of *prior-preconditioning* for sparse regression and 2) a rapid estimation of the curvature of a given log-concave target via *iterative methods* from numerical linear algebra. We demonstrate the scalability of our method on a clinically relevant large-scale observational study involving n >= 80,000 patients and p >= 10,000 predictors, designed to assess the relative efficacy of two alternative hypertension treatments.

• Anirban Bhattacharya: Approximate MCMC for high-dimensional estimation

Abstract: We discuss a number of applications of approximate MCMC to complex high-dimensional structured estimation problems. A unified theoretical treatment is provided to understand the impact of introducing approximations to the exact MCMC transition kernel.

• Sara Wade: Posterior inference for sparse hierarchical non-stationary models

Abstract: Gaussian processes are valuable tools for non-parametric modelling, where typically an assumption of stationarity is employed. While removing this assumption can improve prediction, fitting such models is challenging. In this work, hierarchical models are constructed based on Gaussian Markov random fields with stochastic spatially varying parameters. Importantly, this allows for non-stationarity while also addressing the computational burden through a sparse representation of the precision matrix. The prior field is chosen to be Matérn, and two hyperpriors, for the spatially varying parameters, are considered. One hyperprior is Ornstein-Uhlenbeck, formulated through an autoregressive process. The other corresponds to the widely used squared exponential. In this setting, efficient Markov chain Monte Carlo (MCMC) sampling is challenging due to the strong coupling a posteriori of the parameters and hyperparameters. We develop and compare three MCMC schemes, which are adaptive and therefore free of parameter tuning. Furthermore, a novel extension to higher-dimensional settings is proposed through an additive structure that retains the flexibility and scalability of the model, while also inheriting interpretability from the additive approach. A thorough assessment of the ability of the methods to efficiently explore the posterior distribution and to account for non-stationarity is presented, in both simulated experiments and a real-world computer emulation problem. https://arxiv.org/abs/1804.01431

• MCMC and scalable Bayesian computations : Philippe Gagnon (U of Oxford), Florian Maire (U de Montréal), Giacomo Zanella (Bocconi U).
• Philippe Gagnon: Nonreversible jump algorithms for nested models

Abstract: It is now well known that nonreversible Markov chain Monte Carlo methods often outperform their reversible counterparts. Lifting the state space (Chen et al. (1999)) has proved to be a successful technique for constructing such samplers relying on nonreversible Markov chains. The idea is to see the random variables that we wish to generate as position variables to which we associate velocity (or direction) variables, doubling the size of the state space. At each iteration of such samplers, the positions evolve deterministically as a function of the directions, and this is followed by a possible update of the latter. This direction assisted scheme may induce persistent movements that allow to traverse the state space more quickly, compared with the traditional methods producing chains with diffusive patterns. This explains the gain in efficiency. Directions playing a central role, the technique can only be employed to explore state spaces for which this concept is well defined. In this paper, we introduce samplers that we call nonreversible jump algorithms that can be applied to simultaneously achieve model selection and parameter estimation, in situations where the family of models considered forms a sequence of nested models; there thus exists a natural order among the models, and therefore, directions. These samplers are constructed by modifying reversible jump algorithms after having lifted the part of the state space associated with the model indicator. We demonstrate their correctness and show that they compare favourably to their reversible counterpart using both theoretical arguments as well as numerical experiments. We address implementation challenges, facilitating application by users.

• Florian Maire: Can we improve convergence of MCMC methods by aggregating Markov kernels in a locally informed way?

Abstract: For a given probability distribution $\pi$, there is virtually an infinite number of Markov kernels capable of generating useful Markov chains to infer $\pi$. Hybrid methods refer to algorithms where several Markov kernels are mixed with a fixed probability distribution $\omega$. In this talk, we introduce a dependence between $\omega$ and the current state of the Markov chain, a strategy that we refer to as Locally Informed Hybrid Markov chain, since $\omega$ can be specified so as to reflect the local topology of the state-space. The analysis of this intuitive construction reveals a number of surprises that question some of the usual Markov chain comparison tools, from a statistical learning viewpoint. These include tools based on the spectral analysis of the underlying Markov operator as well as Peskun ordering that give typically pessimistic results for metastable Markov chains, a framework which Locally Informed Hybrid Markov chains fall into. Finally, situations where the statistical efficiency of estimators based on Locally Informed Hybrid Markov chains is superior to that of traditional Hybrid algorithms are discussed.

• Giacomo Zanella: On the robustness of gradient-based sampling algorithms

Abstract: We analyze the tension between robustness and efficiency for Markov chain Monte Carlo (MCMC) sampling algorithms. In particular, we focus on the robustness of MCMC algorithms with respect to heterogeneity in the target, an issue of great practical relevance but still understudied theoretically. We show that the spectral gap of the Markov chains induced by classical gradient-based MCMC schemes (e.g. Langevin and Hamiltonian Monte Carlo) decays exponentially fast in the degree of mismatch between the scales of the proposal and target, while for the random walk Metropolis (RWM) the decay is linear. This result provides theoretical support to the notion that gradient-based MCMC schemes are less robust to heterogeneity and more sensitive to tuning. Motivated by these considerations, we propose a novel and simple-to-implement gradient-based MCMC algorithm, inspired by the classical Barker accept-reject rule, with improved robustness properties. Extensive theoretical results, dealing with robustness to heterogeneity, geometric ergodicity and scaling with dimensionality, show that the novel scheme combines the robustness of RWM with the efficiency of classical gradient-based schemes. The theoretical results are illustrated with simulation studies. (Joint work with Samuel Livingstone.)

• Scalable methods for posterior inference from big data : Subharup Guha (U of Florida), Zhenyu Zhang (UCLA), David Dahl (Brigham Young U).
• Subharup Guha: Fast MCMC techniques for fitting Bayesian mixture models to massive multiple-platform cancer data

Abstract: Recent advances in array-based and next-generation sequencing technologies have revolutionized biomedical research, especially in cancer. Bayesian mixture models, such as finite mixtures, hidden Markov models, and Dirichlet processes, offer elegant frameworks for inference, especially because they are flexible, avoid making unrealistic assumptions about the data features and the nature of the interactions, and permit nonlinear dependencies. However, existing inference procedures for these models do not scale to multiple-platform Big Data and often stretch computational resources past their limits. An investigation of the theoretical properties of these models offers insight into asymptotics that form the basis of broadly applicable, cost-effective MCMC strategies for large datasets. These MCMC techniques have the advantage of providing inferences from the posterior of interest, rather than an approximation, and are applicable to different Bayesian mixture models. Furthermore, they can be applied to develop massively parallel MCMC algorithms for these data. The versatility and impressive gains of the methodology are demonstrated by simulation studies and by a semiparametric integrative analysis that detects shared biological mechanisms in heterogeneous multi-platform cancer datasets. (Joint with Dongyan Yan and Veera Baladandayuthapani.)

• Zhenyu Zhang: Bayesian inference for large-scale phylogenetic multivariate probit models

Abstract: Inferring correlation among biological features is an important yet challenging problem in evolutionary biology. In addition to adjusting for correlations induced from an uncertain evolutionary history, we also have to deal with features measured in different scales: continuous and binary. We jointly model the two feature types by introducing latent continuous parameters for binary features, giving rise to a phylogenetic multivariate probit model. Posterior computation under this model remains problematic with increasing sample size, requiring repeatedly sampling from a high-dimensional truncated Gaussian distribution. Best current approaches scale quadratically in sample size and suffer from slow-mixing. We develop a new computation approach that exploits 1) the state-of-the-art bouncy particle sampler based on piece-wise deterministic Markov process and 2) a novel dynamic programming approach that reduces the cost of likelihood and gradient evaluations to linear in sample size. In an application, we successfully handle a 14,980-dimensional truncated Gaussian, making it possible to estimate correlations among 28 HIV virulence and immunological epitope features across 535 viruses. The proposed approach is of independent interest, being applicable to a broader class of covariance structures beyond comparative biology. (Joint with Akihiko Nishimura, Philippe Lemey, and Marc A. Suchard.)

• David Dahl: Summarizing distributions of latent structure

Abstract: In a typical Bayesian analysis, considerable effort is placed on "fitting the model" (e.g., obtaining samples from the posterior distribution) but this is only half of the inference problem. Meaningful inference usually requires summarizing the posterior distribution of the parameters of interest. Posterior summaries can be especially important in communicating the results and conclusions from a Bayesian analysis to a diverse audience. If the parameters of interest live in R^n, common posterior summaries are means, medians, and modes. Summarizing posterior distributions of parameters with complicated structure is a more difficult problem. For example, the "average" network in the posterior distribution on a network is not easily defined. This paper reviews methods for summarizing distributions of latent structure and then proposes a novel search algorithm for posterior summaries. We apply our method to distributions on variable selection indicators, partitions, feature allocations, and networks. We illustrate our approach in a variety of models for both simulated and real datasets. (Joint with Peter Müller.)

• Efficient computing strategies for high-dimensional problems : Gareth Roberts (U of Warwick), Veronika Rockova (U of Chicago), Gregor Kastner (Vienna U of Economics and Business).
• Gareth Roberts: TBA

Abstract: TBA

• Veronika Rockova: Fast posterior sampling for the spike-and-slab lasso

Abstract: TBA

• Gregor Kastner: Efficient Bayesian computing in many dimensions - applications in economics and finance

Abstract: TBA

• MCMC methods in high dimension, theory and applications: Christophe Andrieu (U of Bristol), Gabriel Stoltz (Ecole des Ponts ParisTech), Umut Simsekli (Télécom ParisTech), Gersende Fort (CNRS, Institut de Mathématiques de Toulouse).
• Christophe Andrieu: All about the Metropolis-Hastings-Green update

Abstract: TBA

• Gabriel Stoltz: Removing the mini-batching error in large scale Bayesian sampling

Abstract: The cost of performing one step of a sampling method such as Langevin dynamics scales linearly with the number of data points in Bayesian inference. To alleviate this issue, mini-batching was put forward by Welling and Teh. However, mini-batching leads to some bias on the a posteriori distribution of parameters. Adaptive Langevin dynamics were devised to remove this bias. The idea is to consider an inertial Langevin dynamics where the friction is a dynamical variable, updated according to some Nose-Hoover feedback (inspired by techniques from molecular dynamics). We show here using techniques from hypocoercivity that the law of Adaptive Langevin dynamics converges exponentially fast to equilibrium, with a rate which can be quantified in terms of the key parameters of the dynamics (mass of the extra variable and magnitude of the fluctuation in the Langevin dynamics). This allows us in particular to obtain a Central Limit Theorem on time averages along realizations of the dynamics. Currently, this method is however limited to unknown diffusion matrices which do not depend on the parameters (additive noise). I will mention extensions to the case of multiplicative noise.

• Umut Simsekli: TBA

Abstract: TBA

• Gersende Fort: TBA

Abstract: TBA

• Computational advancements in entity resolution : Brenda Betancourt (U of Florida), Andee Kaplan (Duke U), Rebecca Steorts (Duke U).
• Brenda Betancourt: Generalized flexible microclustering models for entity resolution

Abstract: Classical clustering tasks accomplished with Bayesian random partition models seek to divide a given population or data set in a relatively small number of clusters whose size grows with the number of data points. For other clustering applications, such as entity resolution, this assumption is inappropriate. Entity resolution (record linkage or de-duplication) is the process of removing duplicate records from noisy databases often in the absence of a unique identifier. One natural approach to entity resolution is as a clustering problem, where each entity is implicitly associated with one or more records and the inference goal is to recover the latent entities (clusters) that correspond to the observed records (data points). In most entity resolution tasks, the clusters are very small and remain small as the number of records increases. This framework requires models that yield clusters whose sizes grow sublinearly with the total number of data points. We introduce a general class of microclustering models suitable for the 'microclustering' problem, and fully characterize its theoretical properties and asymptotic behavior. We also present a partially-collapsed MCMC sampler that, compared to common sampling schemes found in the literature, achieves a significantly better mixing by overcoming strong dependencies between some of the parameters in the model. To improve scalability, we combine the sampling algorithm with a common record linkage blocking technique that allows for parallel programing. (Joint with Giacomo Zanella and Rebecca Steorts.)

• Andee Kaplan: Life after record linkage: Tackling the downstream task with error propagation

• Rebecca Steorts: Scalable end-to-end Bayesian entity resolution

Abstract: Very often information about social entities is scattered across multiple databases. Combining that information into one database can result in enormous benefits for analysis, resulting in richer and more reliable conclusions. In most practical applications, however, analysts cannot simply link records across databases based on unique identifiers, such as social security numbers, either because they are not a part of some databases or are not available due to privacy concerns. In such cases, analysts need to use methods from statistical and computational science known as entity resolution (record linkage or de-duplication) to proceed with analysis. Entity resolution is not only a crucial task for social science and industrial applications, but is a challenging statistical and computational problem itself. One recent development in entity resolution methodology has been the application of Bayesian generative models. These models offer several advantages over conventional methods, namely: (i) they do not require labeled training data; (ii) they treat linkage as a clustering problem which preserves transitivity; (iii) they propagate uncertainty; and (iv) they allow for flexible modeling assumptions. However, due to difficulties in scaling, these models have so far been limited to small data sets of around 1000 records. In this talk, I propose the first scalable Bayesian models for entity resolution. This extension brings together several key ideas, including probabilistic blocking, indexing, and efficient sampling algorithms. The proposed methodology is illustrate on both synthetic and real data. (Joint with Neil Marchant, Benjamin Rubinstein, Andee Kaplan, and Daniel Elazar.)

• ABC : Ruth Baker (U of Oxford), David Frazier (Monash U), Umberto Picchini (Chalmers U of Tech & U of Gothenburg).
• Ruth Baker: Multifidelity approximate Bayesian computation

Abstract: A vital stage in the mathematical modelling of real-world systems is to calibrate a model's parameters to observed data. Likelihood-free parameter inference methods, such as Approximate Bayesian Computation, build Monte Carlo samples of the uncertain parameter distribution by comparing the data with large numbers of model simulations. However, the computational expense of generating these simulations forms a significant bottleneck in the practical application of such methods. We identify how simulations of cheap, low-fidelity models have been used separately in two complementary ways to reduce the computational expense of building these samples, at the cost of introducing additional variance to the resulting parameter estimates. We explore how these approaches can be unified so that cost and benefit are optimally balanced, and we characterise the optimal choice of how often to simulate from cheap, low-fidelity models in place of expensive, high-fidelity models in Monte Carlo ABC algorithms. The resulting early accept/reject multifidelity ABC algorithm that we propose is shown to give improved performance over existing multifidelity and high-fidelity approaches.

• David Frazier: Robust approximate Bayesian inference with synthetic likelihood

Abstract: Bayesian synthetic likelihood (BSL) is now a well-established method for conducting approximate Bayesian inference in complex models where exact Bayesian approaches are either infeasible, or computationally demanding, due to the intractability of likelihood function. Similar to other approximate Bayesian methods, such as the method of approximate Bayesian computation, implicit in the application of BSL is the maintained assumption that the data generating process can generate simulated summary statistics that mimic the behaviour of the observed summary statistics. This notion of model compatibility with the observed summaries is critical for the performance of BSL and its variants. We demonstrate theoretically, and through several examples, that if the assumed data generating process (DGP) differs from the true DGP, model compatibility may no longer be satisfied and BSL can give unreliable inferences. To circumvent the issue of incompatibility between the observed and simulated summary statistics, we propose two robust versions of BSL that can deliver reliable performance regardless of whether or not the observed and simulated summaries are compatible. Simulation results and two empirical examples demonstrate the good performance of this robust approach to BSL, and its superiority over standard BSL when model compatibility is not in evidence.

• Umberto Picchini: Variance reduction for fast ABC using resampling

Abstract: Approximate Bayesian computation (ABC) is the state-of-art methodology for likelihood-free Bayesian inference. Its main feature is the ability to bypass the explicit calculation of the likelihood function, by only requiring access to a model simulator to generate many artificial datasets. In the context of pseudo-marginal ABC-MCMC (Bornn, Pillai, Smith and Woodard, 2017), generating $M> 1$ datasets for each MCMC iteration allows to construct a kernel-smoothed ABC likelihood which has lower variance, this resulting beneficial for the mixing of the ABC-MCMC chain, compared to the typical ABC setup which sets $M=1$. However, setting $M>1$ implies a computational bottleneck, and in Bornn, Pillai, Smith and Woodard (2017) it was found that the benefits of using $M>1$ are not worth the increasing computational effort. In Everitt (2017) it was shown that, when the intractable likelihood is replaced by a \textit{synthetic likelihood} (SL, Wood, 2010), it is possible to use $M=1$ and resample many times from this single simulated dataset, to construct computationally fast SL inference that artificially emulates the case $M>1$. Unfortunately, this approach was found to be ineffective within ABC, as the resampling generates inflated ABC posteriors. In this talk we show how to couple \textit{stratified sampling} with the resampling idea of Everitt (2017). We construct an ABC-MCMC algorithm that uses a small number of model simulations ($M=1$ or 2) for each MCMC iteration, while substantially reducing the additional variance in the approximate posterior distribution induced by resampling. We therefore enjoy the computational speedup from resampling approaches, and show that our stratified sampling procedure allows us to use a larger than usual ABC threshold, while still obtaining accurate inference. (Joint with Richard Everitt.)

• Continuous-time and reversible Monte Carlo methods : Yian Ma (U of California, Berkeley), Manon Michel (U Clermont-Auvergne), Daniel Paulin (U of Oxford).
• Yian Ma: Bridging MCMC and Optimization

Abstract: Rapid growth in data size and model complexity has boosted questions on how computational tools can scale with the problem and data complexity. Optimization algorithms have had tremendous success for convex problems in this regard. MCMC algorithms for mean estimates, on the other hand, are slower than the optimization algorithms in convex unconstrained scenarios. It has even become folklore that the MCMC algorithms are in general computationally more intractable than optimization algorithms. In this talk, I will examine a class of non-convex objective functions arising from mixture models. For that class of objective functions, I discover that the computational complexity of MCMC algorithms scales linearly with the model dimension, while optimization problems are NP hard. I will then study MCMC algorithms as optimization over the KL-divergence in the space of measures. By incorporating a momentum variable, I will discuss an algorithm which performs accelerated gradient descent over the KL-divergence. Using optimization-like ideas, a suitable Lyapunov function is constructed to prove that an accelerated convergence rate is obtained.

• Manon Michel: Accelerations of MCMC methods by non-reversibility and factorization

Abstract: During this talk, I will present the historical development of non-reversible Markov-chain Monte Carlo methods, based on piecewise deterministic Markov processes (PDMP). First developed for multiparticle systems, the goal was to emulate the successes of cluster algorithms for spin systems and was achieved through the replacement of the time reversibility by symmetries of the sampled probability distribution itself. These methods have shown to bring clear accelerations and are now competing with molecular dynamics methods in chemical physics or state-of-the-art sampling schemes, e.g. Hamiltonian Monte Carlo, in statistical inference. I will discuss their successes as well as the remaining open questions. Finally, I will explain how the factorization of the distribution can lead to computational complexity reduction.

• Daniel Paulin: Connections between PDMPs and Hamiltonian Monte Carlo

Abstract: In this talk we are going to explore some connections between Piecewise Deterministic Markov Processes and Hamiltonian Monte Carlo in high dimensions.

• Markov chain convergence analysis and Wasserstein distance: Alain Durmus (ENS Paris-Saclay), Jonathan Mattingly (Duke U), Qian Qin (U of Florida).
• Alain Durmus: TBA

Abstract: TBA

• Jonathan Mattingly: TBA

Abstract: TBA

• Qian Qin: Geometric convergence bounds for Markov chains in Wasserstein distance based on generalized drift and contraction conditions

Abstract: Quantitative bounds on the convergence rate of a Markov chain with respect to some Wasserstein distance can be derived using a set of drift and contraction conditions. Previous studies focus on the case where the parameters in this type of condition are constant. We propose a method for constructing convergence bounds based on generalized drift and contraction conditions whose parameters may vary across the state space. This can lead to significantly improved bounds. Our result also extends existing bounds in the literature to the case where the Wasserstein distance is unbounded.

• Young researchers' contributions to Bayesian computation: Tommaso Rigon (Bocconi U), Michael Jauch (Duke U), Nicholas Tawn (U of Warwick).
• Tommaso Rigon: Bayesian inference for finite-dimensional discrete priors

Abstract: Discrete random probability measures are the main ingredient for addressing Bayesian clustering. The investigation in this area has been very lively, with strong emphasis on nonparametric procedures based either on the Dirichlet process or on more flexible generalizations, such as the Pitman-Yor (PY) process or the normalized random measures with independent increments (NRMI). The literature on finite-dimensional discrete priors, beyond the classic Dirichlet-multinomial model, is much more limited. We aim at filling this gap by introducing novel classes of priors closely related to the PY process and NRMIs, which are recovered as limiting case. Prior and posterior distributional properties are extensively studied. Specifically, we identify the induced random partitions and determine explicit expressions of the associated urn schemes and of the posterior distributions. A detailed comparison with the (infinite-dimensional) PY and NRMIs is provided. Finally, we employ our proposal for mixture modeling, and we assess its performance over existing methods in the analysis of a real dataset.

• Michael Jauch: Bayesian analysis with orthogonal matrix parameters

Abstract: Statistical models for multivariate data are often parametrized by a set of orthogonal matrices. Bayesian analyses of models with orthogonal matrix parameters present two major challenges: posterior simulation on the constrained parameter space and incorporation of prior information such as sparsity or row dependence. We propose methodology to address both of these challenges. To simulate from posterior distributions defined on a set of orthogonal matrices, we propose polar parameter expansion, a parameter expanded Markov chain Monte Carlo approach suitable for routine and flexible posterior inference in standard simulation software. To incorporate prior information, we introduce prior distributions for orthogonal matrix parameters constructed via the polar decomposition of an unconstrained random matrix. Prior distributions constructed in this way satisfy a number of appealing properties and posterior inference can again be carried out in standard simulation software. We illustrate these techniques by fitting Bayesian models for a protein interaction network and gene expression data.

• Nicholas Tawn: The Annealed Leap Point Sampler (ALPS) for multimodal target distributions

Abstract: This talk introduces a novel algorithm, ALPS, that is designed to provide a scalable approach to sampling from multimodal target distributions. The ALPS algorithm concatenates a number of the strengths of the current gold standard approaches for multimodality. It is strongly based around the well known parallel tempering procedure but rather than using “hot state” tempering levels the ALPS algorithm instead appeals to annealing. In annealed temperature levels the modes become even more isolated with the effects of modal skew less pronounced. Indeed the more annealed the temperature the more accurately the local mode is approximated by a Laplace approximation. The idea is to exploit this by utilizing a powerful Gaussian mixture independence sampler at the annealed temperature levels allowing rapid mixing between modes. This mixing information is then filtered back to the target of interest using a parallel tempering-like procedure with carefully designed marginal distributions.

• Approximate Bayesian nonparametrics : Peter Müller (U of Texas), Debdeep Pati (Texas A&M), Bernardo Nipoti (Trinity College Dublin).
• Peter Müller: Consensus Monte Carlo for random subsets using shared anchors

Abstract: We present a consensus Monte Carlo algorithm that scales existing Bayesian nonparametric models for clustering and feature allocation to big data. The algorithm is valid for any prior on random subsets such as partitions and latent feature allocation, under essentially any sampling model. Motivated by three case studies, we focus on clustering induced by a Dirichlet process mixture sampling model, inference under an Indian buffet process prior with a binomial sampling model, and with a categorical sampling model. We assess the proposed algorithm with simulation studies and show results for inference with three datasets: an MNIST image dataset, a dataset of pancreatic cancer mutations, and a large set of electronic health records (EHR).

• Debdeep Pati: Convergence of variational Bayes algorithms

Abstract: We develop techniques for analyzing the convergence of variational Bayes algorithms in three classic examples: i) variational lower bound optimization using convex duality in generalized linear models ii) variational boosting and iii) coordinate ascent inference in discrete graphical models. The key idea is to relate the updates with an associated dynamical system and analyze its spectra. In some cases, we provide specific conditions for the algorithm to converge to the solution, exhibit periodicity or become unstable.

• Bernardo Nipoti: TBA

Abstract: TBA

Contributed Sessions

• Novel mixture-based computational approaches to Bayesian learning: Michele Guindani (U of California, Irvine), Antonietta Mira (U della Svizzera Italiana & U of Insubria), Sonia Petrone (Bocconi U).
• Michele Guindani: Modeling human microbiome data via latent nested nonparametric priors

Abstract: The study of the human microbiome has gained substantial attention in recent years due to its relationship with the regulation of the autoimmune system. During the data-preprocessing pipeline, microbes characterized by similar genome are grouped together in Operational Taxonomic Units (OTUs). Since OTU abundances vary widely across individuals within a population, it is of interest to characterize the diversity of the microbiome to study the association between asymmetries in the human microbiota and various diseases. Here, we propose a Bayesian Nonparametric approach to model abundance tables in presence of multiple populations: a common set of parameters (atoms at the observational level) is used to construct, at a higher level, a set of atoms on a distributional space. Using a common set of atoms at the lower level yields an important advantage: our model does not degenerate to the full exchangeable case when there are ties across samples, thus overcoming the crucial problem of the traditional Nested Dirichlet process outlined by Camerlenghi et al. (2018). To perform posterior inference, we propose a novel Nested independent slice-efficient algorithm. Since OTUs tables consist of frequency counts and are known to be sparse, we express the likelihood as a Rounded Mixture of Gaussian Kernels. Simulation studies confirm that our model does not suffer the nDPMM drawback anymore, and first applications to the microbiomes of Bangladesh babies have shown promising results.

• Antonietta Mira: Adaptive incremental mixture Markov chain Monte Carlo

Abstract: We propose Adaptive Incremental Mixture Markov chain Monte Carlo (AIMM), a novel approach to sample from challenging probability distributions defined on a general state-space. While adaptive MCMC methods usually update a parametric proposal kernel with a global rule, AIMM locally adapts a semiparametric kernel. AIMM is based on an independent Metropolis-Hastings proposal distribution which takes the form of a finite mixture of Gaussian distributions. Central to this approach is the idea that the proposal distribution adapts to the target by locally adding a mixture component when the discrepancy between the proposal mixture and the target is deemed to be too large. As a result, the number of components in the mixture proposal is not fixed in advance. Theoretically, we prove that there exists a process that can be made arbitrarily close to AIMM and that converges to the correct target distribution. We also illustrate that it performs well in practice in a variety of challenging situations, including high-dimensional and multimodal target distributions.

• Sonia Petrone: Quasi-Bayes properties of a procedure for sequential learning in mixture models

Abstract: Bayesian methods are often optimal, yet nowadays pressure for fast computations, especially with streaming data and online learning, brings renewed interest in faster, although possibly sub-optimal, solutions. To what extent these algorithms approximate a Bayesian solution is a problem of interest, not always solved. On this background, we revisit a sequential procedure proposed by Smith and Makov (1978) for unsupervised learning in finite mixtures, and developed by Newton and Zhang (1999) for nonparametric mixtures. The so-called Newton's algorithm is simple and fast, and theoretically intriguing. Although originally proposed as an approximation of the Bayesian solution, its quasi-Bayes properties remain unclear. We propose a novel methodological approach. We regard the algorithm as a probabilistic learning rule, that implicitly defines an underlying probabilistic model; and we find such model. We can then prove that it is, asymptotically, a Bayesian, exchangeable mixture model. Moreover, while the algorithm only offers a point estimate, we can obtain the asymptotic posterior distribution and asymptotic credible intervals for the mixing distribution. We also provide hints for tuning the algorithm and obtaining desirable properties, as we illustrate in a simulation study. Beyond mixture models, our study suggests a theoretical framework of interest for recursive quasi-Bayes methods in other settings.

• Using Bayesian methods to uncover the latent structures in real datasets: Louis Raynal (U of Montpellier & Harvard U), Francesco Denti (U of Milan – Bicocca & U della Svizzera Italiana), Alex Rodriguez (International Center for Theoretical Physics).
• Louis Raynal: Reconstructing the evolutionary history of the desert locust by means of ABC random forest

Abstract: The Approximate Bayesian Computation - Random Forest (ABC-RF) method- ology recently developed to perform model choice (Pudlo et al., 2016; Estoup et al., 2018) and parameter inference (Raynal et al., 2019). It proved to achieve good performance, is mostly insensitive to noise variables and requires very few calibration. In this presentation we expose recent improvements, with a focus on the computation of error measures with random forests for parameter in- ference. As a case study, we are interested in the Schistocerca gregaria desert locust species which is divided in two distinct regions along the north-south axis of Africa. Using ABC-RF on microsatellite data, we reconstruct the evolu- tionary processes explaining the present geographical distribution and estimate parameters as the divergence time between the north and south sub-species.

• Francesco Denti: Bayesian nonparametric dimensionality reduction via estimation of data intrinsic dimensions

Abstract: Even if they are defined on a space with a large dimension, data points usually lie onto hypersurfaces with a much smaller intrinsic dimension (ID). The recent Hidalgo method (Allegra et al., 2019), a Bayesian extension of the TWO-NN model (Facco et al., 2017, Scientific Report), allows estimating the ID when all points lie onto multiple latent manifolds. We consider the data points as a configuration of a Poisson Process (PP) with an intensity proportional to the true density. Hidalgo makes only two weak assumptions: (i) locally, on the scale of the second nearest neighbor, the original PP can be well approximated by a homogeneous one and (ii) points close to each other are more likely to belong to the same manifold. Under (i), the ratio of the distances of a point from its first and second neighbor follows a Pareto distribution that depends parametrically only on the ID. We extended Hidalgo to the Nonparametric case, allowing the estimation of the number of latent manifolds via Dirichlet Process Mixture Model and inducing a clustering among observations characterized by similar ID. We further derive the distributions of the ratios of subsequent distances between neighbors and we prove their independence. This enables us to extract more information from the data without compromising the scalability of our method. While the idea behind the extension is simple, a non-trivial Bayesian scheme is required for estimating the model and assigning each point to the correct manifold. Since the posterior distribution has no closed form, to sample from it we rely on the Slice Sampler algorithm. From preliminary analyses performed on simulated data, the model provides promising results. Moreover, we were able to uncover a surprising ID variability in several real-world datasets.

• Alex Rodriguez: Mapping the topography of complex datasets

Abstract: Data sets can be considered an ensemble of realizations drawn from a density distribution. Obtaining a synthetic description of this distribution allows rationalizing the underlying generating process and building human-readable models. In simple cases, visualizing the distribution in a suitable low-dimensional projection is enough to capture its main features but real world data sets are often embedded in a high-dimensional space. Therefore, I present a procedure that allows obtaining such a synthetic description in an automatic way with the only information of pairwise data distances (or similarities). This methodology is based on a reliable estimation of the intrinsic dimension of the dataset (Facco, et al., 2017) and the probability density function (Rodriguez, et al., 2018) coupled with a modified Density Peaks clustering algorithm (Rodriguez and Laio, 2014). The final outcome of all this machinery working together is a hierarchical tree that summarizes the main features of the data set and a classification of the data that maps to which of these features they belong to (d'Errico, et al., 2018).

• MCMC-based Bayesian inference on Hilbert spaces: Nawaf Bou-Rabee (Rutgers U), Nathan Glatt-Holtz (Tulane U), Daniel Sanz-Alonso (U of Chicago)
• Nawaf Bou-Rabee: TBA

Abstract: TBA

• Nathan Glatt-Holtz: TBA

Abstract: TBA

• Daniel Sanz-Alonso: TBA

Abstract: TBA

• Advances in multiple importance sampling: Art Owen (Stanford U), Victor Elvira (IMT Lille Douai), Felipe Medina Aguayo (U of Reading).
• Art Owen: Robust deterministic weighting of estimates from adaptive importance sampling

Abstract: This talk presents a simple robust way to weight a sequence of estimates generated by adaptive importance sam- pling. Importance sampling is a useful method for estimating rare event probabilities and for sampling posterior distributions. It often generates data that can be used to find an improved sampler leading to methods of adaptive importance sampling (AIS). Under ideal conditions, AIS can approach a perfect sampler and the mean squared error (MSE) vanishes exponentially fast. Under less ideal conditions, including all nontrivial uses of self-normalized importance sampling, the MSE is bounded below by a positive multiple of $1/n$. That rules out exponential convergence but still allows for steady improvements. If we model steady improvement as yielding a sequence of unbiased and uncorrelated estimates with variance proportional to $k^{−y}$ for $1 \le k \le K < \infty$ and $0 \le y \le 1$, then a simple model weighting the $k$th iterate proportionally to $k^{1/2} is nearly optimal. It never raises variance by more than 9/8 over an oracle’s variance even though the resulting convergence rate varies with$y$. Numerical investigation shows that these weights are also robust under additional models of gradual improvement. (This is joint work with Yi Zhou.) • Victor Elvira: Multiple importance sampling for rare events estimation with an application in communication systems Abstract: Digital communications are based on the transmission of symbols that belong to a finite alphabet, each of them carrying one or several bits of information. The receiver estimates the symbol that was transmitted, and in the case of perfect communication without errors, the original sequence of bits is reconstructed. However, real-world communication systems (e.g., in wireless communications) introduce random distortions in the symbols, including additive Gaussian noise, provoking errors in the detected symbols at the receiver. The characterization of the symbol error rate (SER) of the system is of major interest in communications engineering. However, in many systems of interest, the integrals required to evaluate the symbol error rate (SER) in the presence of Gaussian noise are impossible to compute in closed-form, and therefore Monte Carlo simulation is typically used to estimate the SER. Naive Monte Carlo simulation has been traditionally used in the communications literature, even if it can be very inefficient and require very long simulation runs, especially at high signal-to-noise-ratio (SNR) scenarios. At high SNR, the variance of the additive Gaussian noise is small, and hence the rate of errors is very low, which yields raw Monte Carlo impracticable for this rare event estimation problem. In this talk, we start describing (for non-experts) the problem of SER estimation of communication system. Then, we adapt a recently proposed multiple importance sampling (MIS) technique, called ALOE (for "At Least One rare Event") to this problem. Conditioned to a transmitted symbol, an error (or rare event) occurs when the observation falls in a union of half-spaces or, equivalently, outside a given polytope. The proposal distribution for ALOE samples the system conditionally on an error taking place, which makes it more efficient than other importance sampling techniques. ALOE provides unbiased SER estimates with simulation times orders of magnitude shorter than conventional Monte Carlo. Then, we discuss the challenges of SER estimation in multiple-input multiple-output (MIMO) communications, where the rare-event estimation problem requires solving a large number of integrals in a higher-dimensional space. We propose a novel MIS-based approach exploiting the strengths of the ALOE estimator. • Felipe Medina Aguayo: Revisiting balance heuristic with intractable proposals Abstract: Among the different flavours of multiple importance sampling, the celebrated balance heuristic (BH) from Veach and Guibas still remains a popular choice for estimating integrals. The basic ingredients in BH are: a set of proposals$q_l$, indexed by some discrete label$l$, and a deterministic set of weights for these labels. However, in some scenarios sampling from$q_l$is only achieved by sampling jointly with the label$l$; this commonly leads to a joint density whose conditionals and marginals are unavailable or expensive to compute. Despite BH being valid even if the labels are sampled randomly, the intractability of the joint proposal can be problematic, especially when the number of discrete labels is much larger than the number of permitted importance points. In this talk, we first revisit balance heuristic from an extended-space angle, which allows the introduction of intermediate distributions as in annealing importance sampling for variance reduction. We then look at estimating integrals when the proposal is only available in a joint form via a combination of correlated estimators. This idea also fits into the extended-space representation which will, in turn, provide other interesting solutions. (This is joint work with Richard Everitt, U of Reading.) • Simulation in path space: Moritz Schauer (Leiden U), Frank van der Meulen (TU Delft), Andrew Duncan (Imperial College London). • Moritz Schauer: Sampling conditional jump diffusions Abstract: TBA • Frank van der Meulen: Diffusion bridge simulation in geometric statistics Abstract: Recently various stochastic landmarks models have been introduced for shape deformation. The basic modelling consists of stochastic differential equations. Due to the high dimensionality of the state space of these equations the statistical analysis is challenging. Moreover, the diffusion process is hypo-elliptic. Novel methods are discussed to tackle this problem based on methods for simulation of conditioned diffusions. • Andrew Duncan: Infinite dimensional piecewise deterministic Monte Carlo Abstract: TBA • Sequential Monte Carlo: Recent advances in theory and practice: Richard Everitt (U of Reading), Liangliang Wang (Simon Fraser U), Anthony Lee (U of Bristol). • Richard Everitt: Evolution with recombination using state-of-the-art computational methods Abstract: Recombination is a critical process in evolutionary inference, particularly when analysing within-species variation. In bacteria, despite being organisms that reproduce clonally, recombination commonly occurs when a donor cell contributes a small segment of its DNA. This process is typically modelled using an ancestral recombination graph (ARG), which is a generalisation of the coalescent. The ClonalOrigin model ([Didelot et al. 2010]) can be regarded as a good approximation of the aforementioned process, in which recombination events are modelled independently given the clonal genealogy. Inference in the ClonalOrigin model is performed via a reversible-jump MCMC (rjMCMC) algorithm, which attempts to jointly explore: the recombination rate, the number of recombination events, the departure and arrival points on the clonal genealogy for each recombination event, and the sites delimiting the start and end of each recombination event on the genome. However, as known by computational statisticians, the rjMCMC algorithm usually performs poorly due to the difficulty of proposing “good” trans- dimensional moves. Recent developments in Bayesian computation methodology provide ways of improving existing methods and code, but are not well-known outside the statistics community. We present a couple of ideas based on sequential Monte Carlo (SMC) methodology that can lead to faster inference when using the ClonalOrigin model. (This is joint work with Felipe Medina Aguayo and Xavier Didelot.) • Liangliang Wang: Sequential Monte Carlo methods for Bayesian phylogenetics Abstract: Phylogenetic trees, playing a central role in biology, model evolutionary histories of taxa that range from genes to genomes. The goal of Bayesian phylogenetics is to approximate a posterior distribution of phylogenetic trees based on biological data. Standard Bayesian estimation of phylogenetic trees can handle rich evolutionary models but requires expensive Markov chain Monte Carlo (MCMC) simulations. Our previous work has shown that sequential Monte Carlo (SMC) methods can serve as a good alternative to MCMC in posterior inference over phylogenetic trees. In this talk, I will present our recent work on SMC methods for Bayesian Phylogenetics. We illustrate our methods using simulation studies and real data analysis. • Anthony Lee: Latent variable models: statistical and computational efficiency for simple likelihood approximations Abstract: A popular statistical modelling technique is to model data as a partial observation of a random process. This allows, in principle, one to fit sophisticated domain-specific models with easily interpretable parameters. However, the likelihood function in such models is typically intractable, and so likelihood-based inference techniques must deal with this intractability in some way. I will briefly talk about two likelihood-based methodologies, pseudo-marginal Markov chain Monte Carlo and simulated maximum likelihood, and discuss statistical and computational scalability in some example settings. The results are also relevant to the use of sequential Monte Carlo algorithms in high-dimensional general state-space hidden Markov models. • Advances in MCMC for high dimensional and functional spaces: Galin Jones (U of Minnesota), Vivekananda Roy (Iowa State U), Radu Herbei (The Ohio State U) • Galin Jones: Convergence complexity of Gibbs samplers for Bayesian vector autoregressive processes Abstract: We propose a collapsed Gibbs sampler for Bayesian vector autoregressions with predictors, or exogenous variables, and study the proposed sampler’s convergence properties. The Markov chain generated by our algorithm is shown to be geometrically ergodic regardless of whether the number of observations in the underlying vector autoregression is small or large in comparison to the order and dimension of it. We also establish conditions for when the geometric ergodicity is asymptotically stable as the number of observations tends to infinity. Specifically, the geometric convergence rate is shown to be bounded away from unity asymptotically, either in an almost sure sense or with probability tending to one, depending on what is assumed about the data generating process. (This is joint work with Karl Oskar Ekvall.) • Vivekananda Roy: Posterior impropriety of relevance vector machines and a single penalty approach Abstract: Researchers often use sparse Bayesian learning models that take a reproducing kernel Hilbert space approach to carry out the task of prediction for high dimensional datasets. The popular relevance vector machines (RVM) is one such sparse Bayesian learning model. We show that the RVM with hyperparameter values currently used in the literature leads to improper posteriors. We propose a single penalty RVM (SPRVM) model and analyze it using a semi Bayesian approach. The necessary and sufficient conditions for posterior propriety of SPRVM are more liberal than those of RVM and allow for several improper priors over the penalty parameter. Additionally, we also prove geometric ergodicity of the Gibbs sampler used to analyze the SPRVM model and hence can estimate the asymptotic standard errors associated with the Monte Carlo estimate of the means of the posterior predictive distribution. The predictive performance of RVM and SPRVM is compared by analyzing several datasets. (This is joint work with Anand Dixit.) • Radu Herbei: Exact inference in functional regression: Estimating hydrological controls on ecosystem dynamics in an Antarctic lake Abstract: Many of the modern-day statistical inference problems address the issue of estimating an infinite dimensional parameter (a function or a surface). Given that one can only store a finite representation of these objects on a computer, the typical approach is to employ some dimension-reduction strategy and proceed with a statistical inference procedure in a multivariate setting. We introduce an exact inference procedure for functional parameters in a Bayesian regression setting. By "exact" we mean that the MCMC sampler used to explore the posterior distribution over the functional parameter is unaffected by the fact that only finite dimensional ojects are used during the simulation procedure. We use techniques based on randomized acceptance probabilities and Bernoulli factories to ensure that the sampler targets the correct distribution. We apply our method to the problem of estimating the association between stream discharge and physical, chemical, and biological processes within an Antarctic lake system. • Recent advances in Gaussian process computations and theory: Yun Yang (U of Illinois), Joseph Futoma (Harvard U), Michael Zhang (Princeton U). • Yun Yang: Frequentist coverage and sup-norm convergence rate in Gaussian process regression Abstract: GP regression is a powerful interpolation technique due to its flexibility in capturing non-linearity. In this talk, we provide a general framework for understanding the frequentist coverage of point-wise and simultaneous Bayesian credible sets in random design GP regression. Identifying both the mean and covariance function of the posterior distribution of the Gaussian process as regularized M-estimators, we show that the sampling distribution of the posterior mean function and the centered posterior distribution can be respectively approximated by two population level GPs. By developing a comparison inequality between two GPs, we provide exact characterization of frequentist coverage probabilities of Bayesian pointwise credible intervals and simultaneous credible bands of the regression function. Our results show that inference based on GP regression tends to be conservative; when the prior is under-smoothed, the resulting credible intervals and bands have minimax-optimal sizes, with their frequentist coverage converging to a non-degenerate value between their nominal level and one. As a byproduct of our theory, we show that GP regression also yields minimax-optimal posterior contraction rate relative to the supremum norm, which provides positive evidence to the long-standing problem on optimal supremum norm contraction rate in GP regression. • Joseph Futoma: Learning to Detect Sepsis with a Multi-output Gaussian Process RNN Classifier (in the Real World!) Abstract: Sepsis is a poorly understood and potentially life-threatening complication that can occur as a result of infection. Early detection and treatment improve patient outcomes, and as such it poses an important challenge in medicine. In this work, we develop a flexible classifier that leverages streaming lab results, vitals, and medications to predict sepsis before it occurs. We model patient clinical time series with multi-output Gaussian processes, maintaining uncertainty about the physiological state of a patient while also imputing missing values. Latent function values from the Gaussian process are then fed into a deep recurrent neural network to classify patient encounters as septic or not, and the overall model is trained end-to-end using back-propagation. We train and validate our model on a large retrospective dataset of 18 months of heterogeneous inpatient stays from the Duke University Health System, and develop a new “real-time” validation scheme for simulating the performance of our model as it will actually be used. We conclude by showing how this model is saving lives as a part of SepsisWatch, an application currently being used at Duke Hospital to screen, monitor, and coordinate treatment of septic patients. • Michael Zhang: Embarrassingly parallel inference for Gaussian processes Abstract: Gaussian process-based models typically involves an$O(N^3)$computational bottleneck due to inverting the covariance matrix. Popular methods for overcoming this matrix inversion problem cannot adequately model all types of latent functions and are often not parallelizable. However, judicious choice of model structure can ameliorate this problem. A mixture-of-experts model that uses a mixture of$K$Gaussian processes offers modeling flexibility and opportunities for scalable inference. Our embarrassingly parallel algorithm combines low-dimensional matrix inversions with importance sampling to yield a flexible, scalable mixture-of-experts model that offers comparable performance to Gaussian process regression at a much lower computational cost. • Posterior inference with misspecified models: Judith Rousseau (U of Oxford), Ryan Martin (North Carolina State U), Jonathan Huggins (Harvard U) • Judith Rousseau: TBA Abstract: TBA • Ryan Martin: Construction, concentration, and calibration of Gibbs posteriors Abstract: A Bayesian approach, which bases inference on a posterior distribution, has certain advantages, but at the expense of requiring specification of a full statistical model. A Gibbs approach, on the other hand, provides a posterior distribution based on a loss function instead of a likelihood, which has its own advantages, including robustness and computational savings. While the concentration properties of suitably constructed Gibbs posteriors are fairly well understood, the mis- or under-specification affects the spread of the Gibbs posterior in subtle ways. In particular, it is not clear how to scale the Gibbs posterior so that the corresponding credible regions are calibrated in the sense that they achieve the nominal coverage probability. In this talk, I will present some generalities about the construction, concentration, and calibration of Gibbs posteriors along with applications, including an image boundary detection problem. • Jonathan Huggins: Robust Bayesian Inference using BayesBag Abstract: Standard Bayesian inference is known to be sensitive to misspecification, leading to improper uncertainty calibration and poor predictive performance. Since models are almost inevitably approximations to reality, developing inference methods that are robust to model misspecification is crucial to statistical practice. However, finding generally applicable and computationally feasible methods for robust Bayesian inference under misspecification has proven to be a difficult challenge. An intriguing approach is to use bagging on the Bayesian posterior (“BayesBag”) -- that is, to average over bootstrapped posterior distributions. While the BayesBag approach has appeared occasionally in the literature, it has never been thoroughly investigated. In this talk, we develop a comprehensive asymptotic theory for BayesBag in misspecified models. We show that BayesBag provides superior uncertainty calibration in the case of parameter estimation and stable model probabilities in the case of model selection, particularly when multiple models provide similar quality fits for the data. We validate our theory on synthetic and real-world data in a wide range of models, including linear feature selection, sparse logistic regression, phylogenetic tree reconstruction, and regression with Bayesian additive regression trees. Overall, we find that in the presence of significant misspecification, BayesBag produces more stable inferences, has better predictive accuracy, and selects correct models more often than the standard Bayesian posterior; meanwhile, when the model is correctly specified, BayesBag is more conservative or produces similar results to the standard posterior. • Convergence of MCMC in theory and in practice: Christina Knudson (U of St. Thomas, MN), Rui Jin (U of Iowa), Xin Wang (Miami U, OH) • Christina Knudson: Revisiting the Gelman-Rubin Diagnostic Abstract: Gelman and Rubin's (1992) convergence diagnostic is one of the most popular methods for terminating a Markov chain Monte Carlo (MCMC) sampler. Since the seminal paper, researchers have developed sophisticated methods of variance estimation for Monte Carlo averages. We show that this class of estimators find immediate use in the Gelman-Rubin statistic, a connection not established in the literature before. We incorporate these estimators to upgrade both the univariate and multivariate Gelman-Rubin statistics, leading to increased stability in MCMC termination time. An immediate advantage is that our new Gelman-Rubin statistic can be calculated for a single chain. In addition, we establish a relationship between the Gelman-Rubin statistic and effective sample size. Leveraging this relationship, we develop a principled cutoff criterion for the Gelman-Rubin statistic. Finally, we demonstrate the utility of our improved diagnostic via an example. • Rui Jin: Fast MCMC for high dimensional Bayesian regression models with shrinkage priors Abstract: In the past decade, many Bayesian shrinkage models have been developed for linear regression problems where the number of covariates, p, is large. Computing the intractable posterior are often done with three-block Gibbs samplers (3BG), based on representing the shrinkage priors as scale mixtures of Normal distributions. An alternative computing tool is a state of the art Hamiltonian Monte Carlo (HMC) method, which can be easily implemented in the Stan software. However, we found both existing methods to be inefficient and often impractical for large p problems. Following the general idea of Rajaratnam et al. (2018), we propose two-block Gibbs samplers (2BG) for three commonly used shrinkage models, namely, the Bayesian group lasso, the Bayesian sparse group lasso and the Bayesian fused lasso models. We demonstrate with simulated and real data examples that the Markov chains underlying 2BG's converge much faster than that of 3BG's, and no worse than that of HMC. At the same time, the computing costs of 2BG's per iteration are as low as that of 3BG's, and can be several orders of magnitude lower than that of HMC. As a result, the newly proposed 2BG is the only practical computing solution to do Bayesian shrinkage analysis for datasets with large p. Further, we provide theoretical justifications for the superior performance of 2BG's. First, we establish geometric ergodicity (GE) of Markov chains associated with the 2BG for each of the three Bayesian shrinkage models, and derive quantitative upper bounds for their geometric convergence rates. Secondly, we show that the Markov operators corresponding to the 2BG of the Bayesian group lasso and the Bayesian sparse group lasso are trace class, respectively, whereas that of the corresponding 3BG are not even Hilbert-Schmidt. • Xin Wang: Geometric ergodicity of Polya-Gamma Gibbs sampler for Bayesian logistic regression with a flat prior Abstract: The logistic regression model is the most popular model for analyzing binary data. In the absence of any prior information, an improper flat prior is often used for the regression coefficients in Bayesian logistic regression models. The resulting intractable posterior density can be explored by running Polson, Scott and Windle’s (2013) data augmentation (DA) algorithm. In this paper, we establish that the Markov chain underlying Polson, Scott and Windle’s (2013) DA algorithm is geometrically ergodic. Proving this theoretical result is practically important as it ensures the existence of central limit theorems (CLTs) for sample averages under a finite second moment condition. The CLT in turn allows users of the DA algorithm to calculate standard errors for posterior estimates. • Robust Markov chain Monte Carlo methods: Kengo Kamatani (Osaka U), Krzysztof Łatuszynski (Warwick U), Björn Sprungk (Göttingen U) • Kengo Kamatani: Robust Markov chain Monte Carlo methodologies with respect to tail properties Abstract: In this talk, we will discuss Markov chain Monte Carlo (MCMC) methods with heavy-tailed invariant probability distributions. When the invariant distribution is heavy-tailed the algorithm has difficulty reaching the tail area. We study the ergodic properties of some MCMC methods with position dependent proposal kernels and apply them to heavy-tailed target distributions. • Krzysztof Łatuszynski: A framework for adaptive MCMC targeting multimodal distributions Abstract: We propose a new Monte Carlo method for sampling from multimodal distributions. The idea of this technique is based on splitting the task into two: finding the modes of a target distribution and sampling, given the knowledge of the locations of the modes. The sampling algorithm relies on steps of two types: local ones, preserving the mode; and jumps to regions associated with different modes. Besides, the method learns the optimal parameters of the algorithm while it runs, without requiring user intervention. Our technique should be considered as a flexible framework, in which the design of moves can follow various strategies known from the broad MCMC literature. In order to control the jumps, we introduce an auxiliary variable representing each mode and we define a new target distribution on an augmented state space. As the adaptive algorithm runs and updates its parameters, the target distribution also keeps being modified. This motivates a new class of algorithms, Auxiliary Variable Adaptive MCMC. We provide general ergodic results for the whole class before specialising to the case of our algorithm. The performance of the algorithm is illustrated with several multimodal examples. (This is joint work with Chris Holmes and Emilia Pompe.) • Björn Sprungk: Noise level-robust Metropolis-Hastings algorithms for Bayesian inference with concentrated posteriors Abstract: We consider Metropolis-Hastings algorithms for Markov chain Monte Carlo integration w.r.t. a concentrated posterior measure which results from Bayesian inference with a small additive observational noise. Proposal kernels based only on prior information show a deteriorating efficiency for a decaying noise. We propose to use informed proposal kernels, i.e., random walk proposals with a covariance close to the posterior covariance. Here, we use the a-priori computable covariance of the Laplace approximation of the posterior. Besides some numerical evidence we prove that the resulting informed Metropolis-Hastings shows a non-degenerating mean acceptance rate and lag-one autocorrelation as the noise decays. Thus, it performs robustly w.r.t. a small noise-level in the Bayesian inference problem. The theoretical results are based on the recently established convergence of the Laplace approximation to the posterior measure in total variation norm. • Approximate Markov chain Monte Carlo methods: Bamdad Hosseini California Institute of Technology, James Johndrow (Stanford U), Daniel Rudolf (Göttingen U) • Bamdad Hosseini: Perturbation theory for a function space MCMC algorithm with non-Gaussian priors Abstract: In recent years a number of function space MCMC algorithms have been introduced in the literature. The goal here is to design an algorithm that is well-defined on an infinite-dimensional Banach space with the hope that it will be discretization invariant and overcome some issues that are encountered by standard MCMC algorithms in high-dimensions. However, most of the focus in the literature has been on algorithms that rely on the assumption that the prior measure is a Gaussian or at least absolutely continuous with a Gaussian measure. In this talk we introduce a new class of prior-aware Metropolis-Hastings algorithms for non-Gaussian priors and discuss their convergence and perturbation properties such as dimension-independent spectral gaps and various types of approximations beyond standard approximation by discretization or projections. • James Johndrow: Metropolizing approximate Gibbs samplers Abstract: There has been much recent work on “approximate” MCMC algorithms, such as Metropolis-Hastings algorithms that rely on minibatches of data, resulting in bias in the invariant measure. Less studied are the various ways in which approximate Gibbs samplers can be designed. We describe a general strategy for using approximate Gibbs samplers as Metropolis-Hastings proposals. Because it is typically less costly to compute the unnormalized posterior density than to take one step of exact Gibbs, and because the Hastings ratio in these algorithms requires only computation of the approximating kernel at pairs of points, one can often achieve reductions in computational complexity per step with no bias in the invariant measure by using approximate Gibbs as a Metropolis-Hastings proposal. We demonstrate the approach with an application to high-dimensional regression. • Daniel Rudolf: Time-inhomogeneous approximate Markov chain Monte Carlo Abstract: We discuss the approximation of a time-homogeneous Markov chain by a time-inhomogeneous one. An upper bound of the expected absolute difference of the stationary mean, w.r.t. the Markov chain of interest, and the ergodic average based on the approximating Markov chain will be presented. In addition to that we provide explicit estimates of the Wasserstein distance of the difference of the distributions of the Markov chains after n-steps. • Sampling Techniques for High-Dimensional Bayesian Inverse Problems: Qiang Liu (U of Texas), Tan Bui-Thanh (U of Texas), Alex Thiery (National U of Singapore) • Qiang Liu: Stein Variational Gradient Descent: Algorithm, Theory, Applications Abstract: Approximate probabilistic inference is a key computational task in modern machine learning, which allows us to reason with complex, structured, hierarchical (deep) probabilistic models to extract information and quantify uncertainty. Traditionally, approximate inference is often performed by either Markov chain Monte Carlo (MCMC) and variational inference (VI), both of which, however, have their own critical weaknesses: MCMC is accurate and asymptotically consistent but suffers from slow convergence; VI is typically faster by formulating inference problem into gradient-based optimization, but introduces deterministic errors and lacks theoretical guarantees. Stein variational gradient descent (SVGD) is a new tool for approximate inference that combines the accuracy and flexibility of MCMC and practical speed of VI and gradient-based optimization. The key idea of SVGD is to directly optimize a non-parametric particle-based representation to fit intractable distributions with fast deterministic gradient-based updates, which is made possible by integrating and generalizing key mathematical tools from Stein's method, optimal transport, and interacting particle systems. SVGD has been found a powerful tool in various challenging settings, including Bayesian deep learning and deep generative models, reinforcement learning, and meta learning. This talk will introduce the basic ideas and theories of SVGD, and cover some examples of application. • Tan Bui-Thanh: A data-consistent approach to statistical inverse problems Abstract: Given a hierarchy of reduced-order models to solve the inverse problems for quantities of interest, each model with varying levels of fidelity and computational cost, a machine learning framework is proposed to improve the models by learning the errors between each successive levels. Each reduced-order model is a statistical model generating rapid and reasonably accurate solutions to new parameters, and are typically formed using expensive forward solves to find the reduced subspace. These approximate reduced-order models speed up computational time but they introduce additional uncertainty to the solution. By statistically modeling errors of reduced order models and using training data involving forward solves of the reduced order models and the higher fidelity model, we train a deep neural network to learn the error between successive levels of the hierarchy of reduced order models thereby improving their error bounds. The training of the deep neural network occurs during the offline phase and the error bounds can be improved online as new training data is observed. Once the deep-learning-enhanced reduced model is constructed, it is amenable to any sampling method as its cos is a fraction of the cost of the original model. • Alex Thiery: Exploiting geometry for walking larger steps in Bayesian inverse problems Abstract: Abstract: Consider the observation$y = F(x) + \xi$of a quantity of interest$x$-- the random variable$\xi \sim \mathcal{N}(0, \sigma^2 I)$is a vector of additive noise in the observation. In Bayesian inverse problems, the vector$x$typically represents the high-dimensional discretization of a continuous and unobserved field while the evaluations of the forward operator$F(\cdot)$involve solving a system of partial differential equations. In the low-noise regime, i.e.$\sigma \to 0$, the posterior distributions concentrates in the neighbourhood of a nonlinear manifold. As a result, the efficiency of standard MCMC algorithms deteriorates due to the need to take increasingly smaller steps. In this work, we present a constrained HMC algorithm that is robust to small$\sigma\$ values, i.e. low noise. Taking the observations generated by the model to be constraints on the prior, we define a manifold on which the constrained HMC algorithm generate samples. By exploiting the geometry of the manifold, our algorithm is able to take larger step sizes than more standard MCMC methods, resulting in a more efficient sampler. If time permits, we will describe how similar ideas can be leveraged within other non-reversible samplers.

Short Courses/Tutorials/Practice Labs

The conference will begin with four Short Courses/Tutorials/Practice Labs on Tuesday (January 7, 2020). There will be two parallel Short Courses starting 10:30am, and a second pair starting at 2:00pm.
• Introduction to Stan (10:30am-1:30pm)
• Outline: This half-day workshop will introduce you to the probabilistic programming language, Stan, and its Hamiltonian Monte Carlo algorithm. Many Bayesian models can be fitted to data more quickly, and with less sensitivity to priors and initial values, than Gibbs sampler software such as BUGS and JAGS. You will get some hands-on experience of coding for Stan, extracting results and checking for computational problems. This is a very interactive, hands-on workshop and we will use examples of Stan code throughout to give you practical experience.

Trainer: Robert Grant is a medical statistician of 21 years' experience, and a professional trainer and coach for people working in data analysis. He developed and maintains the Stata interface for Stan and frequently teaches introductory courses on Bayesian statistics and data visualization. His personal website is robertgrantstats.co.uk and his company's is bayescamp.com

Pre-requisites: Participants should know the basics of model fitting by MCMC simulation. There is no need for experience of Hamiltonian Monte Carlo or Stan but we will assume understanding of Bayesian analysis, model comparison and diagnosing MCMC problems such as non-convergence. Please bring a laptop with one of the Stan interfaces installed -- it doesn't matter which one as we will focus on the Stan code which is common to all.

Learning outcomes: (1) Know how to get started with Stan via the various interfaces, including the common functionality of checking your model code for errors, translating it to C++, compiling it, sampling from the posterior, summarising the output and exporting chains. (2) Understand the basics of coding regression models up to multilevel models. (3) Be aware of tricks for more efficient parameterisation (4) Know how to obtain statistical and graphical diagnostic outputs, recognise problems and set about debugging. (5) Know how to add a new distribution as a Stan function, expose it to R/Python/Julia for debugging, and use it in the log-likelihood and posterior predictive checks.

• Developing, modifying, and sharing Bayesian algorithms (MCMC samplers, SMC, and more) using the NIMBLE platform in R (10:30am-1:30pm)
• Overview: Do you want to share an algorithm you've developed with other researchers without having to build an entire platform? Do you want to use methods such as MCMC and tailor them for your application without having to implement everything from scratch?

NIMBLE is a platform built on top of R that allows methodologists to write algorithms (and modify existing algorithms) in R-like syntax with automatic compilation for fast run-times via C++ that is auto-generated by the system. NIMBLE gives you access to a variety of tools for ease of implementation: querying of model graphical structure (e.g., parent and child nodes in the model graph), a wide range of mathematical functionality including linear algebra through the Eigen package, calculation of probability density values for nodes in the model graph, simulation of node values, automatic differentiation for gradients, optimization, and storage objects for samples from the model.

This tutorial will introduce you to how to develop algorithms in NIMBLE, including new MCMC samplers and entire new algorithms. We will discuss how developers can build upon NIMBLE's existing algorithms (including a variety of MCMC, Bayesian nonparametric, and SMC methods) to avoid having to reimplement standard methods. Users of methods developed in NIMBLE write their model code in syntax almost identical to BUGS and JAGS but can then apply a variety of algorithms (various MCMC samplers, choosing between samplers, parameter blocking, user-defined samplers, various SMC algorithms, etc.) to the same model. The tutorial will demonstrate how algorithms that you write using NIMBLE are then easily available to users, who can try them out at low cost and compare them to other algorithms available in NIMBLE.

Learning outcomes: The workshop will focus on live demos and hands-on coding. After the workshop, participants will understand (1) how to use NIMBLE to apply algorithms such as MCMC and SMC to fit hierarchical models, (2) how NIMBLE's built-in algorithms are implemented using nimbleFunctions, (3) how to use nimbleFunctions to extend NIMBLE's algorithms, and (4) how to develop algorithms in NIMBLE.

Pre-requisites: Participants should have a basic understanding of Bayesian/hierarchical models and of one or more algorithms such as MCMC or SMC. Some experience with R is also expected. Please bring a laptop; we'll give instructions in advance for installing NIMBLE.

Instructor: Chris Paciorek is one of the core developers of NIMBLE (code repository) and an adjunct professor of Statistics at UC Berkeley. He has presented a variety of workshops and courses on NIMBLE and more generally on statistical computing and Bayesian statistics.

• Practical Bayesian Computation Using SAS® (2:00-5:00pm)
• Overview: This half-day tutorial starts with a primer on two major Bayesian computation tools in the SAS/STAT® product: the MCMC procedure (for general-purpose modeling) and the BGLIMM procedure (for fitting generalized linear hierarchical models). The second half of the course takes a topic-driven approach in which the Bayesian treatment of a wide range of statistical models is illustrated using the two procedures and the SAS language.

PROC MCMC is a general procedure that provides Bayesian inference for a wide range of models. Users are given full control to specify details of any statistical models. Built-in features enable you to work with nonstandard prior or likelihood functions, incorporate your own sampling algorithms, fit multilevel hierarchical models with arbitrary depth and nested or non-nested structures, handle missing data by using a cohesive Bayesian approach, and much more.

PROC BGLIMM, on the other hand, is a specialized procedure for generalized linear hierarchical models. Its simplified syntax greatly reduces the programming burden on users (for example, the CLASS statement handles categorical variables; the REPEATED statement models balanced or unbalanced longitudinal data with repeated measurements). The procedure deploys optimal sampling algorithms that are parallelized for performance and provides convenient access to Bayesian analysis of complex mixed models.

This tutorial introduces you to these procedures and illustrates how you can use them to perform a variety of tasks, such as fitting multilevel hierarchical models, modeling missing data, model assessment, large-scale clinical trial simulation, and predictions.

Learning outcomes: Attendees will learn how to use SAS Bayesian procedures, PROC MCMC and PROC BGLIMM, to conduct Bayesian analysis. This tutorial focuses on the practical use of Bayesian computational methods, and the objective is to equip attendees with computational tools through a series of worked-out examples that demonstrate sound practices for a variety of statistical models and Bayesian concepts.

Prerequisites: A basic understanding of the Bayesian paradigm will be useful. Some knowledge of the SAS language will be helpful but not necessary. You do not need to bring a computer.

Instructor: Fang Chen designed and developed the MCMC procedure. He is a Director of Advanced Statistical Methods at SAS Institute Inc., where he oversees software development in several statistical areas, including Bayesian computation and mixed models. Fang has presented numerous tutorials and workshops on SAS software and various Bayesian applications.

• AutoStat® Workshop: A new software for Bayesian Analysis (2:00-5:00pm)
• Outline: AutoStat® is a fully automated, web based analytics application that allows analysts to enjoy automated model selection, specification and deployment. They can share projects and insights amongst colleagues and project teams. Analysts can undertake scenario analysis with ease and schedule routine tasks using the Pipeline. The proposed workshop will discuss a range of Bayesian models, with class exercises being completed using AutoStat®, which will be available to all participants as a free trial version. The Workshop will be hands-on, with models being explained in terms of at least one case study.

A brief overview to the use of the AutoStat® software will cover the following features:

• Data management, manipulation, filtering and creation of new variables.

• Visualisation options, both standard (for quick exploratory tools) and layered charts for publication purposes.

• Available models, prior specifications and algorithm options. These will include regression, univariate time series, space-time analysis, mixture models and multivariate models.

• The standard results will be illustrated, along with access to saved models to tailor your analysis.

• R integration for extended capability.

• The Document Builder for creating publications, reports and tutorials.

• Dashboard creation for increasing research impact and useability of results.

• The use of pipelines for scheduling tasks.

A key component of using AutoStat® for teaching statistical thinking is in alleviating the need for coding, which allows the instructors to focus on key concepts, questions and outcomes. In this course we will briefly touch on key features of AutoStat®, such as its parallel approach to Bayesian and classical statistics on the GUI, which encourages educators to teach both paradigms within the same course. We will illustrate the project sharing facilities, the calculator tool for “on the fly” demonstrations, tutorial builders and bespoke output creation.

Presenters: Dr Chris Strickland & Dr Clair Alston-Knox

Chris and Clair both work at the AutoStat® Institute (Melbourne, Australia). They have previously worked together in Professor Kerrie Mengersen’s Bayesian Research Group (QUT, Australia).

Their combined work experience involves research positions in both academia and industry, having worked at NSW Agriculture, Bank of Queensland, Monash University, Queensland University of Technology, University of Queensland, Griffith University, University of NSW, Newcastle University, Predictive Analytics Group, Soil Conservation Service and NSW Sport and Recreation.

Code of Conduct

ISBA takes very seriously any form of misconduct, including but not limited to sexual harassment and bullying. All meeting participants are expected to adhere strictly to the official ISBA Code of Conduct. Following the safeISBA motto, we want ISBA meetings to be safe and to be fun. We encourage participants to report any concerns or perceived misconduct to the meeting organizers, Jim Hobert and Christian Robert. Further suggestions can be sent to safeisba@bayesian.org.