Shamanth Kodipura Mahesh — Data Scientist

About Me

I’m a data scientist with 6+ years of hands-on experience translating messy, multi-source data into decisions. I build forecasting models, causal inference pipelines, LLM/NLP systems, and interpretable ML workflows—always with deployment and monitoring in mind. I’m fluent in Python and SQL and ship with MLflow, Docker, and FastAPI so work is reproducible and production-ready.

Recent highlights include climate-driven yield forecasting with SHAP dashboards for policy planning; HR analytics at MyFuse.In where I improved workforce planning accuracy by 30% and reduced churn by 21% using uplift/causal methods; and enterprise data automation saving ~$245K/year through robust ETL and BI. I enjoy partnering with domain experts and leadership to turn ambiguous questions into measurable outcomes.

Skills

Programming: Python, SQL, R, Bash, C#

ML & Forecasting: Scikit-learn, XGBoost, LightGBM, Prophet, ARIMA, TimeGPT, DoWhy, EconML

Deep Learning: PyTorch, TensorFlow, Keras, Hugging Face, torchaudio, torchvision

MLOps & Deployment: MLflow, Docker, FastAPI, Git, Airflow, model monitoring (drift, SHAP, LIME)

Data Engineering: AWS, PySpark, Snowflake, ETL/ELT, Feature Engineering, Web Scraping

Statistics & Experimentation: Hypothesis testing, regression, clustering, causal inference, A/B testing, Bayesian inference

Visualization & BI: Tableau, Power BI, Matplotlib, Seaborn, Plotly, Excel, PowerPoint

Projects

Selected projects demonstrating end‑to‑end ownership: framing → modeling → evaluation → delivery.

Benchmarking SQL‑Generating LLMs (NL→SQL)

Built a reproducible evaluation harness for text‑to‑SQL using exact‑match and execution accuracy. Compared Mistral‑7B‑Instruct and CodeLLaMA‑7B across plain, schema‑aware, and RAG prompts; added LoRA fine‑tuning for CodeLLaMA on Chinook.

Result: RAG cut hallucinations substantially; CodeLLaMA produced more structurally correct SQL while Mistral aligned better to natural language.

Dual metrics catch logically wrong‑but‑runnable queries.
Prompt builder retrieves schema/examples per query to ground generations.
All runs logged for apples‑to‑apples comparisons.

LLM RAG Evaluation

Demand Signal Classification with LoRA‑tuned Phi‑1.5

Converted noisy B2B sales notes into standardized demand categories with LoRA on Phi‑1.5 using 4‑bit quantization—engineered to run on a Colab T4.

Validation accuracy 100% on held‑out synthetic notes; precision/recall/F1 all 1.00. Next: expand with real labeled notes + HITL validation.

Synthesized ~2k notes with realistic noise (typos, spacing, urgency markers).
~1% trainable parameters; early stopping controls overfitting.
Human‑in‑the‑loop review for high‑stakes deployment.

LLM LoRA NLP

Predicting Online Shoppers’ Purchasing Intentions

High‑precision classifier on UCI e‑commerce sessions to sharpen targeting while controlling false positives.

Best model: XGBoost (no SMOTE) — F1 0.64 · Precision 0.71 · Recall 0.58 · Accuracy 0.85. Top drivers: Page Value, Bounce Rate, Exit Rate.

Pipeline: cleaning → one‑hot encoding → stratified split → scaling → model comparison.
Framed around conversion lift and efficient marketing spend.

Classification XGBoost E‑commerce

Autoencoders for Network Anomaly Detection (CICIDS‑2017)

Unsupervised detection trained on benign traffic; flags attacks via reconstruction‑error thresholds with simple monitoring.

Precision 81.9% · Recall 44.3% · F1 57.5% · ROC‑AUC 66.2%. Lightweight enough for near‑real‑time IDS.

Dense AE 78→16→78; MSE loss; Adam; StandardScaler; 88th‑percentile thresholding.
Next: VAE/LSTM‑AE for sequences and adaptive thresholds per service.

Anomaly Detection Autoencoder Cybersecurity

Contact

shamanth21@usf.edu · /shamanth-km · @Shamanth-21