Data-Analyst-Portfolio

Projects

Overview of my projects

Table of Contents


Project 1: Exploratory Data Analysis Using AWS Services

Data Exploration Steps

Project Description:

This project aims to perform an exploratory data analysis (EDA) of the University’s Enrollment and Withdrawal dataset using AWS services to uncover insights related to student enrollment patterns, program distribution, and retention rates. By leveraging AWS tools such as S3, AWS Glue, and Athena, we explore trends based on features like gender, academic programs, and enrollment status. This project demonstrates how cloud-based services streamline large-scale data analysis and visualization.

Dataset:

The dataset contains student information from the University of Canada West (UCW), with details such as:

Objective:

The objective is to analyze student enrollment and withdrawal patterns using AWS services, gaining insights into student demographics, program distribution, and factors influencing graduation and retention rates.

Methodology:

  1. Data Collection and Preparation:
    • Data Storage: Upload the dataset to an Amazon S3 bucket for secure and scalable storage.
    • Data Cleaning and Transformation:
      • Use AWS Glue to create an ETL pipeline to clean the dataset, handle missing values, and format date columns (DateOfBirth, EnrollmentDate, ExpectedGraduationDate) for analysis.
      • Convert the data into a queryable format (e.g., Parquet) for optimized performance.
    • AWS Glue Catalog: Create a data catalog to manage metadata and make data easily discoverable for querying.
  2. Descriptive Statistics:
    • Use AWS Athena to run SQL queries on the dataset stored in S3, generating summary statistics:
      • Calculate the average age of students based on DateOfBirth.
      • Count students per program and per enrollment status (Enrolled, Graduated).
      • Calculate the distribution of enrollment dates to observe academic cycles.
  3. Data Visualization:
    • Amazon QuickSight for interactive data visualization:
      • Bar Charts: Display student counts across programs like MBA, MA, BA, and BCom, and visualize the enrollment status breakdown (Enrolled vs. Graduated).
      • Pie Charts: Visualize the gender distribution of students.
      • Line Charts: Plot enrollment trends over time, showing how many students enrolled in each semester.
      • Stacked Bar Charts: Explore program-wise gender distribution and their corresponding enrollment status to determine retention trends.
  4. Enrollment and Graduation Trends Analysis:
    • Use Athena to compare enrollment and graduation rates across programs and genders:
      • Program Trends: Identify which programs (MBA, MA, BA) show higher graduation rates.
      • Gender Analysis: Analyze whether male or female students tend to graduate at higher rates.
      • Year-over-Year Analysis: Track how enrollment has changed over time, identifying any trends or anomalies.
  5. Insights and Findings:
    • Summarize key insights based on the data exploration:
      • Program-specific Analysis: MBA and Leadership programs show high enrollment, while some programs may have lower graduation rates.
      • Gender-based Patterns: Gender distribution may show differences in enrollment trends.
      • Enrollment Growth: Year-over-year enrollment analysis may highlight academic cycles and potential areas for university outreach.

Tools and Technologies:

Deliverables:


Project 2: Descriptive Analysis Using AWS Services

Project Title: Descriptive Analysis of City of Vancouver Issued Building Permits in the Downtown Region (2023-2024) Using AWS

Descriptive Analysis Steps

Project Overview:

This project focuses on analyzing building permits issued in Vancouver’s downtown region during the years 2023 and 2024. The goal is to gain insights into the types of construction projects, timelines for permit issuance, project values, and trends influencing urban development in the city. The project leverages AWS services such as S3 for data storage, Glue for ETL processes, Athena for querying, and QuickSight for data visualization.

Key Features of the Dataset:

AWS Architecture:

1. Data Ingestion:
2. Data Cleaning and Transformation:

Data Lineage Diagram

3. Data Querying:
4. Data Visualization:

Insights & Findings:

Tools and Technologies:

Deliverables:


Project 3: Diagnostic Analysis Using AWS Services

Project Title: Diagnostic Analysis of City of Vancouver Issued Building Permits in the Downtown Region (2023-2024) Using AWS

Diagnostic Analysis Steps

Project Overview:

This project focuses on the diagnostic analysis of building permits issued in the downtown region of Vancouver during 2023 and 2024. The objective is to uncover insights into the types of work being performed, permit timelines, project values, and other key factors affecting construction trends in the city. The analysis is powered by AWS services like S3 for data storage, Glue for data cleaning and transformation, Athena for querying, and QuickSight for visualization.

Objective:

The primary goal of this project is to conduct a diagnostic analysis of building permits issued in the downtown region of Vancouver during 2023-2024. By analyzing various data points, including project values, permit processing timelines, types of construction work, and geographic distribution, we aim to uncover key trends and factors influencing construction activity. This analysis will provide actionable insights for urban planners and city officials to optimize permit processing, identify high-value projects, and support data-driven decisions for future urban development and planning in Vancouver.

Dataset Key Features:

Methodology:

  1. Data Collection and Preparation:
    • Consolidation of Data:
      • Building permits data for downtown Vancouver is sourced from multiple years (2023-2024) and stored in Amazon S3.
      • AWS Glue is used for data cleaning and transformation, ensuring all date fields (PermitCreatedDate, IssueDate) are properly formatted and missing values are handled.
    • Normalization:
      • Data is normalized to ensure consistent formats for fields like property types, work categories, and permit categories, making it suitable for analysis.
  2. Trend Analysis:
    • Sales Trends Analysis:
      • Data Querying with Amazon Athena:
        • Secure Query Execution:
          • Use IAM policies to control who can run queries on Athena.
          • Ensure that query results are stored in encrypted S3 buckets.
      • Analysis Tasks
        • Query the data using Amazon Athena to analyze trends over time (month/year) for key metrics like:
          • Average Project Value by property type (Dwelling, Retail, Office).
          • Permit Processing Time trends based on categories like Type of Work (Addition, Alteration).
          • Geographic Trends by mapping permit concentration across downtown regions using geo_point_2d coordinates.
          • AWS QuickSight will visualize these trends using time series graphs and geographic heatmaps.
  3. Correlation Analysis:
    • Sales Decline and Variables: Analyze relationships between permit processing delays and variables such as:
      • ProjectValue to determine if higher value projects take longer to process.
      • Geographic area (GeoLocalArea) and the specific TypeOfWork to identify any regions or work types associated with delays.
      • Use statistical methods like regression analysis to quantify the strength of these relationships.
  4. Root Cause Analysis:
    • Investigating Permit Processing Delays:
      • Conduct discussions or interviews (if possible) with city officials and contractors to gain qualitative insights into potential causes of delays.
      • Use structured techniques like the “5 Whys” or Fishbone Diagram to systematically explore potential root causes for processing delays and identify areas for improvement in efficiency.
  5. Segmentation Analysis:
    • Customer Segmentation by Project Type: Segment the data by project characteristics, such as:
      • Property use type (Dwelling, Retail, Office).
      • Work type (Addition, Alteration).
      • This segmentation will allow for a deeper understanding of how different types of projects are impacted by processing times or geographic location.
  6. Synthesis of Findings:
    • Quantitative and Qualitative Integration:
      • Integrate findings from both quantitative (trend and correlation analysis) and qualitative (root cause) research to form a cohesive narrative on the - key drivers of permit processing times and construction trends.
      • Highlight patterns and themes in the data that provide actionable insights for urban planners, city officials, and stakeholders.

Monitoring and Logging

AWS CloudWatch:

AWS CloudTrail:

Challenges and Solutions:

Tools and Technologies:

Deliverables:

Timeline:


Project 4: Data Wrangling for University Enrollment Analysis Using AWS Services

Project Title: Data Wrangling of City of Vancouver Issued Building Permits in the Downtown Region (2023-2024) Using AWS

Data Wrangling Steps

Project Overview:

This project involves performing data wrangling on a dataset of student enrollments at the University. The goal is to clean, transform, and consolidate the student data for enhanced analysis and reporting. The analysis uses AWS services such as S3 for data storage, Glue for ETL, and Athena for querying the data. The wrangled dataset will be used to gain insights into student enrollment trends, graduation timelines, and program distribution, aiding the university in decision-making processes.

Objective:

The primary objective of this project is to perform comprehensive data wrangling on the university’s enrollment dataset. By cleaning and transforming the data, the goal is to improve its usability and accuracy, enabling further descriptive analysis of enrollment patterns, graduation rates, and student demographics. It aims to transform raw data into actionable insights that can inform university administration about key academic metrics.

Dataset Key Features:

AWS Architecture:

Data Ingestion:
Data Cleaning and Transformation:
Data Querying:
Data Visualization:

Methodology:

  1. Discover
    • Understanding the Data: Conducted an initial assessment to comprehend the structure, data types, and content of the dataset. Key features include StudentID, Name, DateOfBirth, Gender, Program, EnrollmentDate, GraduationDate, and Status.
    • Exploratory Data Analysis (EDA): Identified missing values, anomalies, and outliers to understand data quality and distribution.
  2. Structure
    • Organize Data: Defined schemas and data types for each field in the dataset, ensuring that dates, numerical fields, and categorical variables were properly formatted.
    • Format Data: Standardized formats for dates and categorical variables, particularly focusing on consistency in fields such as EnrollmentDate, GraduationDate, and Program.
  3. Clean
    • Handle Missing Values: Used imputation techniques for non-essential fields or removed records with missing essential fields.
    • Remove Duplicates: Eliminated duplicate entries based on StudentID.
    • Correct Errors: Corrected inconsistencies in data entries, including typos in Program names and inaccurate Status fields.
  4. Enrich
    • Feature Engineering:
      • Calculated EnrollmentDuration by finding the difference between EnrollmentDate and GraduationDate.
      • Derived Age from DateOfBirth.
      • Standardized categorical variables by normalizing entries in Gender, Status, and Program.
  5. Validate
    • Data Consistency Checks: Ensured logical consistency, such as validating that EnrollmentDate precedes GraduationDate.
    • Quality Assurance: Conducted quality checks to verify that the data transformations preserved integrity and accuracy.
  6. Publish
    • Curated Dataset: Saved the cleaned and enriched dataset back to AWS S3 in a designated curated zone for further analysis.
    • Documentation: Recorded all data wrangling steps, decisions, and insights in a structured format for transparency and reproducibility.

Tools and Technologies:

Deliverables:

Timeline:

This data wrangling project ensures the preparation of a high-quality dataset that enables the university to conduct detailed analysis of enrollment patterns, helping improve student retention strategies, optimize academic offerings, and enhance overall operational efficiency.


Project 5: Data Quality Control Analysis for City of Vancouver Building Permits

Project Title: Data Quality Control Analysis of Vancouver Downtown Building Permits Using AWS Services

Data Quality Control Steps

Project Overview:

This project involves analyzing building permits issued in Vancouver’s downtown region from 2023 to 2024. The dataset includes information such as project value, type of work, contractor information, permit processing time, and geographic coordinates. The goal is to gain insights to support urban development strategies and improve permit processing efficiency.

Objectives:

1. Data Ingestion:

Dataset Fields:

2. Data Transformation (ETL):

AWS Glue ETL Pipeline:

3. Data Analysis:

Queries in AWS Athena were designed to explore different dimensions of the dataset, such as:

4. Data Monitoring & Controlling:

1. AWS CloudWatch:
2. AWS CloudTrail:

5. Data Visualization:

1. Permit Processing Time by Project Type:
2. Geographic Distribution of Permits
3. Top 10 Projects by Value

Tools and Technologies:

Deliverables:

Timeline