Projects

Welcome to my portfolio! Here, you’ll find a collection of projects that highlight my expertise in data analyticsmachine learningdata visualization, and big data. Each project demonstrates my ability to tackle complex problems, leverage cutting-edge technologies, and deliver actionable insights.


1. House Price Prediction

  • Description: Developed a machine learning model to predict house prices in San Francisco using advanced regression techniques.
  • Technologies Used: Python (NumPy, Pandas, Scikit-learn), Linear Regression, Random Forest, XGBoost, Neural Networks.
  • Key Achievements:
    • Compared the accuracy of multiple models (Linear Regression, Random Forest, XGBoost, Neural Networks) to identify the best predictor.
    • Achieved high prediction accuracy by fine-tuning hyperparameters and feature engineering.
  • Impact: This project showcased my ability to apply machine learning to real-world problems and deliver accurate predictions.

2. Reddit User Pattern Analysis

  • Description: Analyzed user behavior across multiple subreddits to classify users as genuine, bots, or trolls.
  • Technologies Used: Python (Pandas, Matplotlib, Scikit-learn), Natural Language Processing (NLP), Clustering Algorithms.
  • Key Achievements:
    • Built a classification model to identify suspicious user activity, improving community moderation.
    • Uncovered trends and anomalies in user engagement patterns.
  • Impact: This project demonstrated my expertise in NLP and user behavior analysis, providing actionable insights for community managers.

3. Fraud Detection in Insurance Claims

  • Description: Developed AI/ML models to predict policyholder behavior and identify potential fraud in the insurance domain.
  • Technologies Used: Python (Scikit-learn, Keras, TensorFlow), R, SAS, Power BI.
  • Key Achievements:
    • Reduced fraudulent claims by 25% through accurate prediction and risk assessment.
    • Created interactive dashboards in Power BI to visualize fraud trends and insights.
  • Impact: This project significantly improved risk assessment and saved costs for the organization.

4. Power BI Dashboards for Pharmaceutical Manufacturing

  • Description: Designed and implemented 100+ Power BI dashboards to manage pharmaceutical manufacturing plants.
  • Technologies Used: Power BI, SQL, SSIS, SSAS.
  • Key Achievements:
    • Automated data extraction, transformation, and reporting, reducing turnaround time by 30%.
    • Provided real-time insights into production efficiency, inventory management, and quality control.
  • Impact: These dashboards enabled data-driven decision-making and improved operational efficiency.

5. Fake Job Posting Prediction

  • Description: Built a machine learning model to detect fake job postings using data from Kaggle.
  • Technologies Used: Python (Pandas, Scikit-learn), Logistic Regression, Random Forest, KNN.
  • Key Achievements:
    • Conducted exploratory data analysis (EDA) to identify patterns and features indicative of fake postings.
    • Achieved high accuracy in classifying job postings as genuine or fake.
  • Impact: This project showcased my ability to apply machine learning to solve real-world problems and protect job seekers.

6. ETL Workflow Optimization

  • Description: Designed and optimized ETL workflows to integrate data from multiple sources into a centralized SQL Server data warehouse.
  • Technologies Used: SSIS, SSAS, SQL, Power BI.
  • Key Achievements:
    • Reduced runtime by 80% by converting SQL queries to DAX and optimizing SSAS Cube.
    • Automated the month-end data validation process, eliminating manual efforts.
  • Impact: This project improved data processing efficiency and enabled comprehensive business intelligence reporting.

7. Custom Machine Learning Model for Missing Zip Code Prediction

  • Description: Built a custom ML model to predict missing zip codes for credit/debit card transaction data.
  • Technologies Used: Python (Scikit-learn, Pandas), SQL.
  • Key Achievements:
    • Improved data completeness and accuracy by predicting missing zip codes.
    • Reduced manual efforts by 70% through automation.
  • Impact: This project enhanced data quality and streamlined transaction processing.