I am a third year PhD student at The University of Texas Austin, advised by Prof. Constantine Caramanis and Prof. Sanjay Shakkottai. My research focuses on the theoretical foundations of generative models (e.g. rectified flows, diffusion models, and GANs) and their applications in conditional sampling (e.g. inverse problems, image editing, and personalization).
I am currently working as a student researcher at Google Research.
Prior to UT Austin, I worked as a Scientist/Engineer-SD at the Indian Space Research Organisation, where I developed operational deep learning algorithms and analyzed their convergence properties.
We present Constrained Posterior Sampling (CPS), a scalable diffusion sampling process that generates realistic time series samples that belong to a constraint set. Without any additional training, CPS can handle a large number of constraints without sacrificing sample quality. We provide a detailed theoretical analysis of the effect of modifying the traditional diffusion sampling process with CPS.
We present an efficient inversion method for RF models, including Flux, that requires no additional training, latent optimization, prompt tuning, or complex attention processors. We develop a new vector field for RF inversion, interpolating between two competing objectives: consistency with a possibly corrupted input image, and consistency with the “true” distribution of clean images.
We introduce Reference-Based Modulation (RB-Modulation), a training-free plug-and-play solution for content and style personalization. By incorporating style features into the controller’s terminal cost, we modulate the drift field in diffusion models’ reverse dynamics, enabling training-free personalization. Further, we propose an Attention Feature Aggregation (AFA) module that decouples content from the reference style image.
We present an efficient second-order approximation using Tweedie's formula to mitigate the bias incurred in the widely used first-order samplers. With this method, we devise a surrogate loss function to refine the reverse process at every diffusion step to address inverse problems and perform high-fidelity text-guided image editing.
Solving inverse problems (e.g. inpainting/deblurring) for general domain images is hard. Magic Eraser and other commercial tools use separately trained models for each task. We introduce PSLD, a method that uses Stable Diffusion to solve all linear problems without any extra training.
We provide a theoretical justification for sample recovery using diffusion based image inpainting in a linear model setting. Unlike most inpainting algorithms, we prove that diffusion based inpainting generalizes well to unseen masks without retraining. Motivated by our analysis, we propose a modified RePaint algorithm we call RePaint+ that provably recovers the underlying true sample and enjoys a linear rate of convergence.
We develop a technique that allows us to prove convergence rates for (L0, L1)-smooth functions without assuming uniform bounds on the noise support. The key innovation behind our results is a carefully constructed stopping time. This is simultaneously large on average and allows us to decorrelate the adaptive stepsizes from the gradients, which is a major challenge in many analyses.
While Optimal Transport (OT) cost serves as the loss for popular generative models, we demonstrate that the OT map can be used as the generative model itself. Previous analogous approaches consider OT maps as generative models only in the latent spaces due to their poor performance in the original high-dimensional ambient space. In contrast, we fit OT maps directly in the ambient space, e.g., a space of high-dimensional images.
A major concern of Sliced Wasserstein (SW) distance is that it requires a large number of projections in high-dimensional settings. To address this concern, we derive projections from a small number of bottleneck projections. We introduce Hierachical Radon Transform (HRT) that recursively applies Radon Transform (RT). We design Hierarchical Sliced Wasserstein (HSW) distance to estimate the discrepancy between measures in high dimensions.
First, we prove that GANs with content or identity losses learn optimal transport (OT) maps between source and target measures in super-resolution tasks. Second, we empirically demonstrate that these learned OT maps are biased and provide an OT solver to recover an unbiased OT map. It provides nearly state-of-the-art performance on the unpaired AIM19 benchmark without having to use content or identity losses.
In this paper, we intend to demystify an interesting phenomenon: adversarial interaction in GANs creates non-homogeneous equilibrium by inducing Turing instability in a Pseudo-Reaction-Diffusion (PRD) model. This is in contrast to supervised learning where the identical model achieves homogeneous equilibrium.
In this study, we observe that a system in which a generator and a discriminator adversarially interact with each other exhibits Turing-like patterns in the hidden layer and top layer of the generator.
Despite numerous attempts sought to provide empirical evidence of adversarial regularization outperforming sole supervision, the theoretical understanding of such phenomena remains elusive. In this study, we aim to resolve whether adversarial regularization indeed performs better than sole supervision at a fundamental level.
In this article, we devise a method, which we call ALERT, to tackle missing band reconstruction. The proposed method reconstructs missing band with the sole supervision of spectral and spatial priors.
This paper seeks to address synthesis of high resolution multi-spectral satellite imagery using adversarial learning. Guided by the discovery of attention mechanism, we regulate the process of band synthesis through spatio-spectral Laplacian attention.
In this study, we propose to parameterize action variables by matrices, and train our policy network using Monte-Carlo sampling. We study the implications of parametric action space in a model-free environment from theoretical and empirical perspective.
This paper describes the techniques developed to enhance the Phobos image from MCC multi-frame acquisitions using image rectification and topographic data. After incorporating these techniques, the final Phobos image appears more representative, spatially enhanced, and has normalized radiometry to study its surface features.
Results of over ninety trackers are presented; many are state-of-the-art trackers published at major computer vision conferences or in journals in the recent years.
Results of 81 trackers are presented; many are state-of-the-art trackers published at major computer vision conferences or in journals in the recent years.
Here, we propose a robust framework that offers the provision to incorporate illumination and rotation invariance in the standard Discriminative Correlation Filter (DCF) formulation. We also supervise the detection stage of DCF trackers by eliminating false positives in the convolution response map.
This paper discusses a novel approach to regress in temporal domain, based on weighted aggregation of distinctive visual features and feature prioritization with entropy estimation in a recursive fashion.
Results of over eighty trackers are presented; many are state-of-the-art trackers published at major computer vision conferences or in journals in the recent years.
In this paper, we study the necessity to capture various physical constraints through motion consistency which has been demonstrated to improve accuracy, robustness and more importantly rotation adaptiveness.
The developed algorithm has been implemented to yield the physically significant chemiluminescence emission from hydroxyl radicals in flames from line-of-sight integrated images. The effectiveness of this algorithm is highlighted using exemplary OH chemiluminescence images captured from a standard swirl stabilized research burner.
The present embodiment proposes an efficient Fast Fourier Transform (FFT) based hyper-spectral image compression technique to store multiple acquisitions over same region of interest and thereby, improve Signal to Noise Ratio (SNR) of hyper-spectral images which usually have coarse spatial resolution.
In this chapter, we propose a robust framework that offers the provision to incorporate illumination and rotation invariance in the standard Discriminative Correlation Filter (DCF) formulation. We also supervise the detection stage of DCF trackers by eliminating false positives in the convolution response map.
This chapter discusses a novel approach to regress in the temporal domain, based on weighted aggregation of distinctive visual features and feature prioritization with entropy estimation in a recursive fashion.