Research

Investigating the Foundations of Learning Algorithms

My research is centered on understanding the fundamental limits and capabilities of machine learning algorithms.

View All Publications

Core Research Topics

Central Limit Theorem for Stochastic Optimization

Stochastic optimization is a cornerstone of modern machine learning. Understanding its convergence behavior is key to both theory and practice. My research proves non-asymptotic Central Limit Theorems (CLTs) that describe how the output of stochastic algorithms fluctuates around the optimum. These results enable principled approaches to hyperparameter tuning and uncertainty quantification in learning systems.

Relevant Publications

Nonasymptotic CLT and Error Bounds for Two-Time-Scale Stochastic Approximation

Seo Taek Kong , Sihan Zeng , Thinh T. Doan , R. Srikant

Under Review

Published

This paper establishes the first non-asymptotic CLT for Polyak-Ruppert averaging in two-time-scale stochastic approximation, providing precise error characterizations.

Exact Error Exponents for Nonlinear Stochastic Approximation

In Progress

In Progress

Applications

  • Stochastic Gradient Descent (SGD)

    Stochastic Gradient Descent (SGD) is a widely used algorithm for training machine learning models. Its iterative, noise-driven updates lead to sample trajectories that fluctuate as they descend the loss landscape. These trajectories may converge to the optimal solution at varying rates, depending on initialization, noise, and hyperparameters. Non-asymptotic Central Limit Theorems provides a probabilistic description of these fluctuations, quantifying the likelihood of deviations from the optimum within finite time.

    Illustration for Stochastic Gradient Descent (SGD)
    Figure 1: Sample trajectory of SGD on a loss landscape. Randomness in the initialization and gradient estimate influences convergence behavior. The spread of such trajectories is characterized by a non-asymptotic Central Limit Theorem, which quantifies their probabilistic fluctuations around the optimal solution.
  • Reinforcement Learning

    In reinforcement learning, algorithms like Temporal Difference (TD) learning estimate value functions from noisy, sequential data. When combined with function approximation, these updates form a stochastic optimization process whose convergence behavior is often hard to analyze. The non-asymptotic Central Limit Theorem offers a way to understand the distribution of TD iterates around the true value function, enabling finite-sample guarantees and uncertainty-aware learning in complex environments.

    Illustration for Reinforcement Learning
    Figure 2: (Left) A simple grid world environment. An agent learns the value function by traversing paths under stochastic transitions. (Right) Histogram of the error in the estimated value function. The non-asymptotic CLT captures the shape and spread of this distribution, enabling quantitative analysis of learning variability and confidence in the estimates.

Sampling Error Bounds for Diffusion Models

Diffusion models are used to sample from complex distributions, such as those in generative modeling. My research focuses on deriving error bounds for the sampling process of diffusion models. These bounds quantify the deviation of sampled outputs from the true distribution, enabling better understanding and control over the sampling quality.

Relevant Publications

Sampling Error and Score Matching for Diffusion Models

In Progress

In Progress

This paper derives sharp error bounds for diffusion model sampling, quantifying the deviation of sampled outputs from the true distribution.

Applications

  • Generative Modeling

    Diffusion models are a class of generative models that learn to sample from complex data distributions. They iteratively refine random noise into structured outputs, such as images or text. Understanding the sampling error is crucial for ensuring high-quality outputs and controlling model behavior.

    Illustration for Generative Modeling
    Figure 3: Diffusion model sampling process. A deep learning model is used to learn the parameters of a stochastic differential equation, and gradually transforms noise into a realistic image.

Crowdsourcing and Label Aggregation

Crowdsourcing algorithms are used to aggregate labels from multiple workers to infer the true underlying labels. My research focuses on a method to cluster tasks by difficulty, which addresses the common problem where worker reliability changes depending on the task's complexity. This approach enables aggregation models like Dawid-Skene to better estimate worker reliability on a per-difficulty basis, improving the overall accuracy of the final labels.

Relevant Publications

Spectral Clustering for Crowdsourcing with Inherently Distinct Task Types

Saptarshi Mandal* , Seo Taek Kong* , Dimitrios Katselis , R. Srikant

Under Review

Published

*Denotes equal contribution.

Applications

  • Noisy Labels in Machine Learning

    This research was motivated by my experience working with medical images, where the inherent difficulty of labeling radiology data often leads to noisy annotations from experts. In such applications, the proposed difficulty-clustering method can be used to generate a more reliable ground truth from multiple conflicting labels. This ultimately improves the quality of the data used for training and evaluating diagnostic AI models, enhancing their performance and robustness.

    Illustration for Noisy Labels in Machine Learning

Industry & Applied Research

Amazon

During my 12 months as an Applied Scientist Intern, I developed and validated machine learning models for key business problems. My work spanned two areas: developing Large Language Models (LLMs) to improve semantic search ranking, and architecting predictive models from user behavioral signals to solve critical data sparsity issues in product quality analysis.

Description

Machine Learning Pipeline

I employ a versatile modeling strategy, utilizing both custom deep learning (PyTorch, HuggingFace) and AutoML (AutoGluon) before rigorous offline benchmarking. This end-to-end process ensures the final deliverables are not only powerful but also robustly validated against business objectives.

Machine learning workflow
Figure: A representative large-scale deep learning workflow. This architecture demonstrates my approach to building scalable machine learning pipelines. The process begins with distributed querying on large-scale cloud data, followed by high-performance ETL using modern data frame libraries.
Large Language Models (LLMs) Semantic Search Predictive Modeling

VUNO Inc.

In my three years at VUNO, I evolved from a Researcher to a Research Team Lead, solving problems that arise when developing deep learning models for medical imaging. I led research on ML methodologies that resulted in first-author publications at premier AI venues (NeurIPS, AAAI) and contributed to the analysis of models across various modalities.

Description

Detecting abnormalities in Chest X-rays

Many diseases, such as tumors or infections, manifest as subtle visual patterns in chest X-ray images including localized shadows, nodules, or irregular textures. Deep learning models can be trained to recognize these abnormalities. By detecting and localizing such patterns, these models help scale diagnostic support in medical imaging.

Nodule
Figure: Example chest X-ray from the NIH ChestX-ray14 dataset, with a bounding box indicating a labeled lung nodule.
[1] Wang, X., Peng, Y., Lu, L., Lu, Z., Bagheri, M., & Summers, R. M. (2017). Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases. In Proceedings of the IEEE conference on computer vision and pattern recognition.

Relevant Publications

A Neural Pre-Conditioning Active Learning Algorithm to Reduce Label Complexity

Seo Taek Kong , Soomin Jeon , Dongbin Na , Jaewon Lee , Hong-Seok Lee , Kyu-Hwan Jung

NeurIPS 2022

Developed an active learning algorithm to best select data to be labelled.

Key Feature Replacement of In-Distribution Samples for Out-of-Distribution Detection

Jaeyoung Kim* , Seo Taek Kong* , Dongbin Na , Kyu-Hwan Jung

AAAI 2023

Proposed a method to train a neural network for out-of-distribution detection.

*Denotes equal contribution.

Self-supervised learning with electrocardiogram delineation for arrhythmia detection

Byeong Tak Lee* , Seo Taek Kong* , Youngjae Song , Yeha Lee

IEEE EMBC 2021

Developed a self-supervised learning method for arrhythmia detection using ECG data.

*Denotes equal contribution.

Medical AI Computer Vision