About Me

Greetings! I am a Master’s student in Data Science at UC San Diego and an aspiring Machine Learning Research Engineer. My passion lies in building efficient, scalable, and high-performance AI systems, with a focus on large-scale training, generative AI, and distributed inference.

I currently hold multiple roles that reflect this technical depth:

  • πŸš€ Open-Source Research Engineer, FastVideo β€” Leading contributions to a distributed video diffusion training library (3K+ stars), optimizing multi-node H100 clusters and implementing novel LoRA extraction pipelines for 5B+ parameter models.
  • πŸ€– GenAI Specialist, Scale AI β€” Collaborating with the research division to design chain-of-thought evaluation frameworks and analyze RLHF trajectories for frontier reasoning models.
  • 🌎 Graduate Researcher, Climate Analytics Lab (UCSD) β€” Developing interpretable machine learning pipelines (PySR, SINDy) to uncover physical relationships in climate data.

Technical Skills

  • ML/DL Frameworks: PyTorch, CUDA, TensorFlow, Transformers (HuggingFace), DeepSpeed, FSDP, Ray, vLLM, SGLang, FlashAttention, Diffusion, Multi-modal Models, JAX
  • LLMs & GenAI: Fine-tuning (LoRA, PEFT, SFT), RLHF, DPO, Post-training Alignment, Model Compression, Distillation, Quantization, MoE, Multi-GPU Training, 3D Parallelism
  • Programming & Tools: Python, C/C++, SQL, Git, Docker, Kubernetes, AWS, Weights & Biases
  • Data Science: Causal Inference, Statistical Modeling, Spark, Dask, ETL Pipelines

Education

University of California San Diego (Sept 2024 – Mar 2026)
M.S. Data Science, GPA: 3.9/4.0

Indian Institute of Technology Madras (May 2023 – Apr 2024)
Diploma in Data Science, GPA: 4.0/4.0

VIT University, Chennai (Sept 2020 – Jul 2024)
B.Tech. Computer Science & Engineering, GPA: 3.96/4.0


Selected Projects

  • FastVideo Distributed Training β€” Unified 10B+ parameter video diffusion models with distributed training infrastructure; optimized inference achieving 2.5x speedup on H100 clusters.
  • BirdCLEF+ 2025 β€” Developed per-species XGBoost classifiers using spectrogram statistics and metadata; achieved >0.90 AUC across species.
  • Symbolic Regression for Cloud-Aerosol Interactions β€” Benchmarked interpretable models vs. deep neural nets, balancing RΒ² performance with explainability.
  • NeuroFraudGAN β€” Synthesized financial transaction data with GANs, reducing class imbalance by 40% and achieving 96% fraud detection accuracy.

I’m motivated by challenges at the intersection of AI systems and research. Whether optimizing video diffusion models or shaping the future of frontier LLMs, I aim to bridge theoretical research and scalable application with rigor and engineering excellence.