publications
publications by categories in reversed chronological order.
2023
- ThesisAdvancing Variational Inference via Thermodynamic IntegrationVaden MasraniUniversity of British Columbia, 2023
Variational inference (VI) is a popular method used within statistics and machine learning to approximate intractable probability distributions via optimization. Central to VI is the Evidence Lower Bound (ELBO), a variational objective function which lower bounds the log marginal likelihood, and can be used to jointly perform maximum likelihood parameter estimation and approximate posterior inference using stochastic gradient ascent. The core contribution of this thesis is the Thermodynamic Variational Objective (TVO), a novel variational objective derived from a key connection we make between variational inference and thermodynamic integration. The TVO both tightens and generalizes the ubiquitous ELBO, and empirically leads to improvements in model and inference network learning in both discrete and continuous deep generative models. Using a novel exponential family interpretation of the geometric mixture curve underlying the TVO, we characterize the divergence bound gap left by the TVO as a sum of KL divergences between adjacent distributions, with the forward and reverse KL’s corresponding to the lower and upper-bound TVO variants. To enable the TVO to be used in gradient- based optimization algorithms, we provide two computationally efficient score-function and doubly-reparameterized based gradient estimators, as well as two adaptive “schedulers” which choose the discretization locations of a one- dimensional Riemann integral approximation, a key hyperparameter in the TVO. Additionally, we show that the objective functions used in Variational Inference, Variational AutoEncoders, Wake-Sleep, Inference Compilation, and Rényi Divergence Variational Inference are all special cases of the TVO. Finally, we evaluate the TVO in two real-world settings - a stochastic control flow models with discrete latent variables, and multi-agent trajectory prediction with continuous latent variables built on top of a differentiable driving simulator - and find the TVO improves upon baseline objectives in both cases.
2022
- NeurIPSFlexible Diffusion Modeling of Long VideosNeural Information Processing Systems, 2022
We present a framework for video modeling based on denoising diffusion probabilistic models that produces long-duration video completions in a variety of realistic environments. We introduce a generative model that can at test-time sample any arbitrary subset of video frames conditioned on any other subset and present an architecture adapted for this purpose. Doing so allows us to efficiently compare and optimize a variety of schedules for the order in which frames in a long video are sampled and use selective sparse and long-range conditioning on previously sampled frames. We demonstrate improved video modeling over prior work on a number of datasets and sample temporally coherent videos over 25 minutes in length. We additionally release a new video modeling dataset and semantically meaningful metrics based on videos generated in the CARLA self-driving car simulator.
- TPAMIBeyond Simple Meta-Learning: Multi-purpose Models for Multi-Domain, Active and Continual Few-Shot LearningPeyman Bateni, Jarred Barber, Raghav Goyal, and 4 more authorsIn IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) Special Issue on Learning with Fewer Labels in Computer Vision (In Submission), 2022
Modern deep learning requires large-scale extensively labelled datasets for training. Few-shot learning aims to alleviate this issue by learning effectively from few labelled examples. In previously proposed few-shot visual classifiers, it is assumed that the feature manifold, where classifier decisions are made, has uncorrelated feature dimensions and uniform feature variance. In this work, we focus on addressing the limitations arising from this assumption by proposing a variance-sensitive class of models that operates in a low-label regime. The first method, Simple CNAPS, employs a hierarchically regularized Mahalanobis-distance based classifier combined with a state of the art neural adaptive feature extractor to achieve strong performance on Meta-Dataset, mini-ImageNet and tiered-ImageNet benchmarks. We further extend this approach to a transductive learning setting, rproposing Transductive CNAPS. This transductive method combines a soft k-means parameter refinement procedure with a two-step task encoder to achieve improved test-time classification accuracy using unlabelled data. Transductive CNAPS achieves state of the art performance on Meta-Dataset. Finally, we explore the use of our methods (Simple and Transductive) for "out of the box" continual and active learning. Extensive experiments on large scale benchmarks illustrate robustness and versatility of this, relatively speaking, simple class of models. All trained model checkpoints and corresponding source codes have been made publicly available.
- FrontiersPlanning as Inference in Epidemiological Dynamics ModelsFrontiers in Artificial Intelligence, 2022
In this work we demonstrate how to automate parts of the infectious disease-control policy-making process via performing inference in existing epidemiological models. The kind of inference tasks undertaken include computing the posterior distribution over controllable, via direct policy-making choices, simulation model parameters that give rise to acceptable disease progression outcomes. Among other things, we illustrate the use of a probabilistic programming language that automates inference in existing simulators. Neither the full capabilities of this tool for automating inference nor its utility for planning is widely disseminated at the current time. Timely gains in understanding about how such simulation-based models and inference automation tools applied in support of policy-making could lead to less economically damaging policy prescriptions, particularly during the current COVID-19 pandemic.
2021
- UAIq-Paths: Generalizing the Geometric Annealing Path using Power MeansIn 37th Conference on Uncertainty in Artificial Intelligence, 2021
Many common machine learning methods involve the geometric annealing path, a sequence of intermediate densities between two distributions of interest constructed using the geometric average. While alternatives such as the moment-averaging path have demonstrated performance gains in some settings, their practical applicability remains limited by exponential family endpoint assumptions and a lack of closed form energy function. In this work, we introduce q-paths, a family of paths which is derived from a generalized notion of the mean, includes the geometric and arithmetic mixtures as special cases, and admits a simple closed form involving the deformed logarithm function from nonextensive thermodynamics. Following previous analysis of the geometric path, we interpret our q-paths as corresponding to a q-exponential family of distributions, and provide a variational representation of intermediate densities as minimizing a mixture of α-divergences to the endpoints. We show that small deviations away from the geometric path yield empirical gains for Bayesian inference using Sequential Monte Carlo and generative model evaluation using Annealed Importance Sampling.
2020
- NeurIPSAnnealed Importance Sampling with q-PathsR. Brekelmans*, V. Masrani*, B. Thang, and 4 more authorsIn NeurIPS Workshop on Deep Learning through Information Geometry (Best Paper Award), 2020
Annealed Importance Sampling (AIS) is the gold standard for estimating partition functions or marginal likelihoods, corresponding to importance sampling over a path of distributions between a tractable base and an unnormalized target. While AIS yields an unbiased estimator for any path, existing literature has been limited to the geometric mixture or moment-averaged paths associated with the exponential family and KL divergence. We explore AIS using q-paths, which include the geometric path as a special case and are related to the homogeneous power mean, deformed exponential family, and α-divergence.
- NeurIPSGaussian Process Bandit Optimization of the Thermodynamic Variational ObjectiveNeural Information Processing Systems, 2020
Achieving the full promise of the Thermodynamic Variational Objective (TVO), a recently proposed variational lower bound on the log evidence involving a one-dimensional Riemann integral approximation, requires choosing a "schedule" of sorted discretization points. This paper introduces a bespoke Gaussian process bandit optimization method for automatically choosing these points. Our approach not only automates their one-time selection, but also dynamically adapts their positions over the course of optimization, leading to improved model learning and inference. We provide theoretical guarantees that our bandit optimization converges to the regret-minimizing choice of integration points. Empirical validation of our algorithm is provided in terms of improved learning and inference in Variational Autoencoders and Sigmoid Belief Networks.
- ICMLAll in the Exponential Family: Bregman Duality in Thermodynamic Variational InferenceInternational Conference On Machine Learning, 2020
The recently proposed Thermodynamic Variational Objective (TVO) leverages thermodynamic integration to provide a family of variational inference objectives, which both tighten and generalize the ubiquitous Evidence Lower Bound (ELBO). However, the tightness of TVO bounds was not previously known, an expensive grid search was used to choose a "schedule" of intermediate distributions, and model learning suffered with ostensibly tighter bounds. In this work, we propose an exponential family interpretation of the geometric mixture curve underlying the TVO and various path sampling methods, which allows us to characterize the gap in TVO likelihood bounds as a sum of KL divergences. We propose to choose intermediate distributions using equal spacing in the moment parameters of our exponential family, which matches grid search performance and allows the schedule to adaptively update over the course of training. Finally, we derive a doubly reparameterized gradient estimator which improves model learning and allows the TVO to benefit from more refined bounds. To further contextualize our contributions, we provide a unified framework for understanding thermodynamic integration and the TVO using Taylor series remainders.
- Planning as Inference in Epidemiological ModelsarXiv preprint, 2020
In this work we demonstrate how existing software tools can be used to automate parts of infectious disease-control policy-making via performing inference in existing epidemiological dynamics models. The kind of inference tasks undertaken include computing, for planning purposes, the posterior distribution over putatively controllable, via direct policy-making choices, simulation model parameters that give rise to acceptable disease progression outcomes. Neither the full capabilities of such inference automation software tools nor their utility for planning is widely disseminated at the current time. Timely gains in understanding about these tools and how they can be used may lead to more fine-grained and less economically damaging policy prescriptions, particularly during the current COVID-19 pandemic.
2019
- CVPRImproved Few-Shot Visual ClassificationP. Bateni, R. Goyal, V. Masrani, and 2 more authorsConference on Computer Vision and Pattern Recognition, 2019
Few-shot learning is a fundamental task in computer vision that carries the promise of alleviating the need for exhaustively labeled data. Most few-shot learning approaches to date have focused on progressively more complex neural feature extractors and classifier adaptation strategies, as well as the refinement of the task definition itself. In this paper, we explore the hypothesis that a simple class-covariance-based distance metric, namely the Mahalanobis distance, adopted into a state of the art few-shot learning approach (CNAPS) can, in and of itself, lead to a significant performance improvement. We also discover that it is possible to learn adaptive feature extractors that allow useful estimation of the high dimensional feature covariances required by this metric from surprisingly few samples. The result of our work is a new "Simple CNAPS" architecture which has up to 9.2 % fewer trainable parameters than CNAPS and performs up to 6.1% better than state of the art on the standard few-shot image classification benchmark dataset.
- NeurIPSThe Thermodynamic Variational ObjectiveV. Masrani, Tuan Anh Le, and F. WoodNeural Information Processing Systems, 2019
We introduce the thermodynamic variational objective (TVO) for learning in both continuous and discrete deep generative models. The TVO arises from a key connection between variational inference and thermodynamic integration that results in a tighter lower bound to the log marginal likelihood than the standard variational evidence lower bound (ELBO) while remaining as broadly applicable. We provide a computationally efficient gradient estimator for the TVO that applies to continuous, discrete, and non-reparameterizable distributions and show that the objective functions used in variational inference, variational autoencoders, wake sleep, and inference compilation are all special cases of the TVO. We use the TVO to learn both discrete and continuous deep generative models and empirically demonstrate state of the art model and inference network learning.
2018
- Detecting dementia from written and spoken languageV. MasraniUniversity of British Columbia, 2018
This thesis makes three main contributions to existing work on the automatic detection of dementia from language. First we introduce a new set of biologically motivated spatial neglect features, and show their inclusion achieves a new state of the art in classifying Alzheimer’s disease (AD) from recordings of patients undergoing the Boston Diagnostic Aphasia Examination. Second we demonstrate how a simple domain adaptation algorithm can be used to leveraging AD data to improve classification of mild cognitive impairment (MCI), a condition characterized by a slight-but-noticeable decline in cognition that does not meet the criteria for dementia, and a condition for which reliable data is scarce. Third, we investigate whether dementia can be detected from written rather than spoken language, and show a range of classifiers achieve a performance far above baseline. Additionally, we create a new corpus of blog posts written by authors with and without dementia and make it publicly available for future researchers.
2017
- CCAIDomain adaptation for detecting mild cognitive impairmentIn Canadian Conference on Artificial Intelligence, 2017
Lexical and acoustic markers in spoken language can be used to detect mild cognitive impairment (MCI), a condition which is often a precursor to dementia and frequently causes some degree of dysphasia. Research to develop such a diagnostic tool for clinicians has been hindered by the scarcity of available data. This work uses domain adaptation to adapt Alzheimer’s data to improve classification accuracy of MCI. We evaluate two simple domain adaptation algorithms, AUGMENT and CORAL, and show that AUGMENT improves upon all baselines. Additionally we investigate the use of previously unconsidered discourse features and show they are not useful in distinguishing MCI from healthy controls.
- ACLDetecting dementia through retrospective analysis of routine blog posts by bloggers with dementiaIn BioNLP 2017, 2017
We investigate if writers with dementia can be automatically distinguished from those without by analyzing linguistic markers in written text, in the form of blog posts. We have built a corpus of several thousand blog posts, some by people with dementia and others by people with loved ones with dementia. We use this dataset to train and test several machine learning methods, and achieve prediction performance at a level far above the baseline.
- ACLGenerating and Evaluating Summaries for Partial Email Threads: Conversational Bayesian Surprise and Silver StandardsJ. Johnson, V. Masrani, G. Carenini, and 1 more authorIn Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue, 2017
We define and motivate the problem of summarizing partial email threads. This problem introduces the challenge of generating reference summaries for partial threads when human annotation is only available for the threads as a whole, particularly when the human-selected sentences are not uniformly distributed within the threads. We propose an oracular algorithm for generating these reference summaries with arbitrary length, and we are making the resulting dataset publicly available. In addition, we apply a recent unsupervised method based on Bayesian Surprise that incorporates background knowledge into partial thread summarization, extend it with conversational features, and modify the mechanism by which it handles redundancy.
- IEEEA study of social interactions in open source component useM. Palyart, G. Murphy, and V. MasraniIEEE Transactions on Software Engineering, 2017
All kinds of software projects, whether open or closed source, rely on open source components. Repositories that serve open source components to organizations, such as the Central Repository and npmjs.org, report billions of requests per year. Despite the widespread reliance of projects on open source components, little is known about the social interactions that occur between developers of a project using a component and developers of the component itself. In this paper, we investigate the social interactions that occur for 5,133 pairs of projects, from two different communities (Java and Ruby) representing user projects that depend on a component project. We consider such questions as how often are there social interactions when a component is used? When do the social interactions occur? And, why do social interactions occur? Our results provide insight into how socio-technical interactions occur beyond the level of an individual or small group of projects previously studied by others and identify the need for a new model of socio-technical congruence for dependencies between, instead of within, projects.
- Improving Diagnostic Accuracy Of Alzheimer’s Disease From Speech Analysis Using Markers Of Hemispatial NeglectThalia S Field, V. Masrani, G. Murray, and 1 more authorAlzheimer’s & Dementia: The Journal of the Alzheimer’s Association, 2017
Machine learning has been previously used to distinguish patients with Alzheimer’s disease (AD) versus healthy controls using transcripts of descriptions of the “Cookie Theft” picture from the Boston Diagnostic Aphasia Exam. Previous work achieved a positive predictive value (PPV) of 0.83 (95% CI 0.79 – 0.87) and negative predictive value (NPV) of 0.81 (0.74 – 0.88) using lexical (e.g. word choice and complexity) and acoustic (e.g. pauses, prosody) features extracted from interviews. Given that language deficits may be associated with other dominant hemisphere findings, we evaluated the diagnostic utility in adding markers of hemispatial neglect to our previous baseline algorithm.