Enhanced Text-to-Speech models for Indian and regional Indian accents, improving user experience. Developed a framework for objective evaluation of generated audio and increased consistency of fine-tuned models.
Engineered fine-tuning pipelines for OpenAI Whisper model implementing modular layer freezing and knowledge distillation, and PEFT using PyTorch Lightning and HuggingFace Transformers
Achieved Word Error Rate (WER) performance within 4% of state-of-the-art models for telephonic Indian audio transcription through custom fine-tuning approaches
Implemented real-time streaming of OpenAI Whisper over WebSocket using NVIDIA TensorRT, achieving 200ms average response time with 10 concurrent client connections on a single GPU setup
Developed end-to-end data collection and processing pipelines incorporating Label Studio for systematic data labeling of speech-to-text and text-to-speech training datasets
Data Science Intern
Oriserve, Noida
Mar 2024 - Sep 2024
Fine-tuned wav2vec2 models for regional multilingual automatic speech recognition, achieving a Word Error Rate (WER) score of 30%
Implemented context-aware RAG-based question-answering chatbots with streaming capabilities
Engineered a high-performance, streaming-enabled pipeline using various generative AI models
Conducted comprehensive evaluations of open-source technologies from various providers
Associate Data Scientist
Actyv.ai, Bengaluru
Jun 2023 - Aug 2023
Fine-tuned a large language model with 700 million parameters using QLoRA and PEFT
Led efforts to improve model's human alignment and reduce toxicity using RLHF
Implemented GCP Workbench AI for model hosting and training
Created ensemble neural network regressors achieving MSE value of 0.13
Utilized Vision Transformers for document classification with 97% accuracy
Data Science Intern
Actyv.ai, Bengaluru
Nov 2022 - Jun 2023
Developed scalable RESTful API solutions using Flask and Python
Managed deployment of ML projects using Docker, Jenkins, and AWS
Led team in creating NLP classification model with 90% f1 score
PROJECT EXPERIENCE
Cricket Elo Rating
Dec 2023
Implemented dynamic system to update cricket team ratings post-match
Analyzed IPL teams' Elo ratings and performance metrics
Created visual representations of Elo rating progressions
Stock Price Recommendation
Oct 2022
Performed TimeSeries analysis with ADF Fuller test
Developed RNN models with LSTM for feature extraction
Achieved 0.8% Mean Percentage Error in multivariate model