Welcome to my portfolio! Here, you’ll find a collection of projects that highlight my expertise in data analytics, machine learning, data visualization, and big data. Each project demonstrates my ability to tackle complex problems, leverage cutting-edge technologies, and deliver actionable insights.
1. House Price Prediction
- Description: Developed a machine learning model to predict house prices in San Francisco using advanced regression techniques.
- Technologies Used: Python (NumPy, Pandas, Scikit-learn), Linear Regression, Random Forest, XGBoost, Neural Networks.
- Key Achievements:
- Compared the accuracy of multiple models (Linear Regression, Random Forest, XGBoost, Neural Networks) to identify the best predictor.
- Achieved high prediction accuracy by fine-tuning hyperparameters and feature engineering.
- Impact: This project showcased my ability to apply machine learning to real-world problems and deliver accurate predictions.
2. Reddit User Pattern Analysis
- Description: Analyzed user behavior across multiple subreddits to classify users as genuine, bots, or trolls.
- Technologies Used: Python (Pandas, Matplotlib, Scikit-learn), Natural Language Processing (NLP), Clustering Algorithms.
- Key Achievements:
- Built a classification model to identify suspicious user activity, improving community moderation.
- Uncovered trends and anomalies in user engagement patterns.
- Impact: This project demonstrated my expertise in NLP and user behavior analysis, providing actionable insights for community managers.
3. Fraud Detection in Insurance Claims
- Description: Developed AI/ML models to predict policyholder behavior and identify potential fraud in the insurance domain.
- Technologies Used: Python (Scikit-learn, Keras, TensorFlow), R, SAS, Power BI.
- Key Achievements:
- Reduced fraudulent claims by 25% through accurate prediction and risk assessment.
- Created interactive dashboards in Power BI to visualize fraud trends and insights.
- Impact: This project significantly improved risk assessment and saved costs for the organization.
4. Power BI Dashboards for Pharmaceutical Manufacturing
- Description: Designed and implemented 100+ Power BI dashboards to manage pharmaceutical manufacturing plants.
- Technologies Used: Power BI, SQL, SSIS, SSAS.
- Key Achievements:
- Automated data extraction, transformation, and reporting, reducing turnaround time by 30%.
- Provided real-time insights into production efficiency, inventory management, and quality control.
- Impact: These dashboards enabled data-driven decision-making and improved operational efficiency.
5. Fake Job Posting Prediction
- Description: Built a machine learning model to detect fake job postings using data from Kaggle.
- Technologies Used: Python (Pandas, Scikit-learn), Logistic Regression, Random Forest, KNN.
- Key Achievements:
- Conducted exploratory data analysis (EDA) to identify patterns and features indicative of fake postings.
- Achieved high accuracy in classifying job postings as genuine or fake.
- Impact: This project showcased my ability to apply machine learning to solve real-world problems and protect job seekers.
6. ETL Workflow Optimization
- Description: Designed and optimized ETL workflows to integrate data from multiple sources into a centralized SQL Server data warehouse.
- Technologies Used: SSIS, SSAS, SQL, Power BI.
- Key Achievements:
- Reduced runtime by 80% by converting SQL queries to DAX and optimizing SSAS Cube.
- Automated the month-end data validation process, eliminating manual efforts.
- Impact: This project improved data processing efficiency and enabled comprehensive business intelligence reporting.
7. Custom Machine Learning Model for Missing Zip Code Prediction
- Description: Built a custom ML model to predict missing zip codes for credit/debit card transaction data.
- Technologies Used: Python (Scikit-learn, Pandas), SQL.
- Key Achievements:
- Improved data completeness and accuracy by predicting missing zip codes.
- Reduced manual efforts by 70% through automation.
- Impact: This project enhanced data quality and streamlined transaction processing.