Research

Investigating the Foundations of Learning Algorithms

My research is centered on understanding the fundamental limits and capabilities of machine learning algorithms.

View All Publications

Core Research Topics

Central Limit Theorem for Stochastic Optimization

Relevant Publications

Nonasymptotic CLT and Error Bounds for Two-Time-Scale Stochastic Approximation

Seo Taek Kong , Sihan Zeng , Thinh T. Doan , R. Srikant

Under Review

Published

This paper establishes the first non-asymptotic CLT for Polyak-Ruppert averaging in two-time-scale stochastic approximation, providing precise error characterizations.

Sampling Error Bounds for Diffusion Models

Relevant Publications

Sampling Error and Score Matching for Diffusion Models

In Progress

In Progress

This paper derives sharp error bounds for diffusion model sampling, quantifying the deviation of sampled outputs from the true distribution.

Crowdsourcing and Label Aggregation

Relevant Publications

Spectral Clustering for Crowdsourcing with Inherently Distinct Task Types

Saptarshi Mandal* , Seo Taek Kong* , Dimitrios Katselis , R. Srikant

Under Review

Published

*Denotes equal contribution.

Industry & Applied Research

Amazon

  • Search Relevance (Palo Alto, CA): Fine-tuned a listwise LLM for search reranking using PyTorch + HuggingFace on AWS SageMaker (multi-GPU).
  • Impact: Significantly improved ranking in nDCG@1.
  • Product Quality (Seattle, WA): Developed a predictive GBDT/NN ensemble model for pre-purchase customer satisfaction using Scala Spark, Polars, and AutoGluon.
  • Impact: Generated dense defect probability scores to address log sparsity, enabling downstream applications in search ranking and fault attribution.

Description

Large Language Models (LLMs) Semantic Search Predictive Modeling

VUNO Inc.

  • Developed deep learning architectures for Chest X-ray and other medical imaging analysis using PyTorch.
  • Designed an active learning algorithm that significantly reduced labeling costs (published at NeurIPS 2022).
  • Implemented out-of-distribution (OOD) detection modules for production diagnostic software to prevent silent failures on anomalous data (published at AAAI).

Relevant Publications

A Neural Pre-Conditioning Active Learning Algorithm to Reduce Label Complexity

Seo Taek Kong , Soomin Jeon , Dongbin Na , Jaewon Lee , Hong-Seok Lee , Kyu-Hwan Jung

NeurIPS 2022

Developed an active learning algorithm to best select data to be labelled.

Key Feature Replacement of In-Distribution Samples for Out-of-Distribution Detection

Jaeyoung Kim* , Seo Taek Kong* , Dongbin Na , Kyu-Hwan Jung

AAAI 2023

Proposed a method to train a neural network for out-of-distribution detection.

*Denotes equal contribution.

Self-supervised learning with electrocardiogram delineation for arrhythmia detection

Byeong Tak Lee* , Seo Taek Kong* , Youngjae Song , Yeha Lee

IEEE EMBC 2021

Developed a self-supervised learning method for arrhythmia detection using ECG data.

*Denotes equal contribution.

Medical AI Computer Vision

</div>