About Me
Greetings! I am a Masterβs student in Data Science at UC San Diego and an aspiring Machine Learning Research Engineer. My passion lies in building efficient, scalable, and high-performance AI systems, with a focus on large-scale training, generative AI, and distributed inference.
I currently hold multiple roles that reflect this technical depth:
- π Open-Source Research Engineer, FastVideo β Leading contributions to a distributed video diffusion training library (3K+ stars), optimizing multi-node H100 clusters and implementing novel LoRA extraction pipelines for 5B+ parameter models.
- π€ GenAI Specialist, Scale AI β Collaborating with the research division to design chain-of-thought evaluation frameworks and analyze RLHF trajectories for frontier reasoning models.
- π Graduate Researcher, Climate Analytics Lab (UCSD) β Developing interpretable machine learning pipelines (PySR, SINDy) to uncover physical relationships in climate data.
Technical Skills
- ML/DL Frameworks: PyTorch, CUDA, TensorFlow, Transformers (HuggingFace), DeepSpeed, FSDP, Ray, vLLM, SGLang, FlashAttention, Diffusion, Multi-modal Models, JAX
- LLMs & GenAI: Fine-tuning (LoRA, PEFT, SFT), RLHF, DPO, Post-training Alignment, Model Compression, Distillation, Quantization, MoE, Multi-GPU Training, 3D Parallelism
- Programming & Tools: Python, C/C++, SQL, Git, Docker, Kubernetes, AWS, Weights & Biases
- Data Science: Causal Inference, Statistical Modeling, Spark, Dask, ETL Pipelines
Education
University of California San Diego (Sept 2024 β Mar 2026)
M.S. Data Science, GPA: 3.9/4.0
Indian Institute of Technology Madras (May 2023 β Apr 2024)
Diploma in Data Science, GPA: 4.0/4.0
VIT University, Chennai (Sept 2020 β Jul 2024)
B.Tech. Computer Science & Engineering, GPA: 3.96/4.0
Selected Projects
- FastVideo Distributed Training β Unified 10B+ parameter video diffusion models with distributed training infrastructure; optimized inference achieving 2.5x speedup on H100 clusters.
- BirdCLEF+ 2025 β Developed per-species XGBoost classifiers using spectrogram statistics and metadata; achieved >0.90 AUC across species.
- Symbolic Regression for Cloud-Aerosol Interactions β Benchmarked interpretable models vs. deep neural nets, balancing RΒ² performance with explainability.
- NeuroFraudGAN β Synthesized financial transaction data with GANs, reducing class imbalance by 40% and achieving 96% fraud detection accuracy.
Iβm motivated by challenges at the intersection of AI systems and research. Whether optimizing video diffusion models or shaping the future of frontier LLMs, I aim to bridge theoretical research and scalable application with rigor and engineering excellence.
