Engineered and own a production-grade LLM pipeline on AWS Bedrock (Claude) monitoring 200+ global companies across 91 locations, ingesting 800–1,000 articles/day via EventRegistry API with full audit logging.
Built structured classification and QC tagging workflows achieving 96% decision accuracy (manually validated), using prompt engineering and automated quality checks.
Implemented semantic deduplication using Amazon Titan Embeddings and cosine similarity, eliminating cross-company and cross-day duplicate articles at scale.
Delivered concurrent LLM processing, ad-free semantic HTML content generation, and PostgreSQL batch ingestion pipeline supporting reliable daily production delivery.
SG Analytics
Data Science Intern | Jan 2025 - July 2025
Developed a smart web crawler and data pipeline using Python, Scrapy, FastAPI, and AWS S3, automating the extraction of 500+ URLs/minute with 95% accuracy and visualizing results via an interactive Streamlit dashboard.
Automated an SFDR-compliant CIM system to extract ESG data and KPIs from unstructured corporate documents, generating structured Excel reports to streamline financial analysis and reporting.
Designed an intelligent SWOT Analysis System using Streamlit, RAG, and Amazon Bedrock, integrating SEC 10-K and annual reports with web data to produce factually accurate, auto-generated business reports.
Implemented a secure document Q&A platform using OpenWebUI and Amazon Bedrock for enterprise-grade information retrieval from uploaded corporate files.
Chegg
Subject Matter Expert | Nov 2023 - Sep 2025
Delivered 400+ optimized solutions in C++, Python, Data Structures, Algorithms, and Optimization.
Provided comprehensive explanations on a wide range of computer science topics with an Average rating of 4.5+.
Get in touch
It's easy to lie with statistics. It's hard to tell the truth without statistics.