Nick Celotto
  • Home
  • Projects
  • Photography
Home Projects Photography

Data Analytics

Projects

End-to-end analytical work — the problem, the process, and what it produced.

Tableau SQL Excel

NYC Rental Market
Dashboard

Analyzing NYC rental market trends from 2015 to 2026 — uncovering the full impact of COVID-19 on long-term rent levels using Zillow data, PostgreSQL, and Tableau.

View on Tableau Public → GitHub Repo
+52.9% Total rent growth from 2015 to 2026
+16.5% Single-year jump in 2022 — largest on record
+27.6% Post-COVID premium vs pre-COVID average
Live Dashboard Interactive — hover, filter, and explore directly below
The Full Story

The Problem

NYC rent data is publicly available but rarely analyzed in a way that tells a coherent story. The Zillow ZORI dataset comes as a wide-format CSV with 139 columns — one per month — making trends invisible without transformation. The goal was to clean, reshape, and analyze the data end-to-end to answer one question: what actually happened to NYC rents before, during, and after COVID?

The Approach

I used Python to unpivot the raw Zillow data from wide to long format and classify each month into one of three periods: Pre-COVID, COVID, and Post-COVID. I loaded the cleaned data into PostgreSQL and wrote five SQL queries using window functions — including LAG() for year-over-year change and a trailing 12-month rolling average using ROWS BETWEEN. The findings were then visualized in Tableau with four KPI cards and three charts.

What It Found

The COVID rent dip was real but short-lived — rents fell 2.3% in 2020 and bottomed out in mid-2021. What followed was extraordinary: 2022 saw a 16.5% single-year jump, the largest in the dataset by a wide margin. By February 2026 the average NYC metro rent reached $3,258 — 52.9% above 2015 levels. The market stabilized at this new elevated level, with post-COVID rents averaging 27.6% above pre-COVID.

"Rents didn't just spike after COVID — they stabilized at a permanently higher level. That's the real story."
↗ NYC rents rose 52.9% from 2015 to 2026 — from $2,153 to $3,258/month
↗ 2022 saw a +16.5% single-year jump — the largest in the dataset — as post-COVID demand surged back
↗ The COVID dip was temporary — rents fully recovered within 18 months and never looked back
↗ Post-COVID average rent is 27.6% higher than pre-COVID — a permanent structural shift in the market
Tableau · PostgreSQL · Python · Excel Zillow ZORI · 2015–2026 Live
Python pandas Seaborn

Sales Performance Analysis

View Notebook → GitHub
Customer Churn Analysis

The Problem

Customer churn is one of the most expensive problems in subscription businesses — it's far cheaper to retain a customer than acquire a new one. This project used IBM's public Telco dataset to identify which customer characteristics and behaviors most strongly predict churn, surfacing actionable segments for a retention team to target.

Approach

Exploratory data analysis in pandas across customer attributes — after cleaning (nulls, categorical encoding, TotalCharges dtype fix), I used Matplotlib and Seaborn to visualize churn rates across contract type, tenure, payment method, and service bundle. The goal was a communicable story, not a black-box model.

Key Findings

  • Month-to-month customers churned at 3× the rate of annual subscribers — by far the strongest predictor.
  • Churn dropped sharply after 12 months of tenure; early-stage customers are the highest-priority retention target.
  • Electronic check payers churned at nearly 2× the rate of auto-pay customers.
  • Service bundles reduced churn; premium add-ons without a contract did not.

Impact

Clear prioritization framework: focus retention on month-to-month customers in year one paying by electronic check — 22% of the base, disproportionate share of churn.

ToolsPython, pandas, Matplotlib, Seaborn, Jupyter
DatasetIBM Telco Customer Churn (~7k rows)
TechniquesEDA, churn segmentation, cohort comparison
Timeline2 weeks
StatusComplete
SQL PostgreSQL

Personal Finance Spending Analysis

View on GitHub →
E-Commerce SQL Analysis

The Problem

E-commerce businesses generate enormous transactional data but often lack the query infrastructure to answer strategic questions: who are the most valuable customers, are they being retained, and which products belong together? This project built that analytical layer from scratch.

Approach

Three analysis areas in PostgreSQL: CLV segmentation, cohort-based retention, and product affinity. Every query is documented with a plain-English explanation. No ORMs, no Python wrappers — demonstrating that complex business questions can be answered cleanly in SQL alone.

Key Findings

  • Top 20% of customers drove 68% of revenue — stronger Pareto than typical benchmarks.
  • Month-1 retention 38%; stabilized at ~22% by month 3, minimal drop-off beyond.
  • Highest product affinity pair: 4.2× lift above random — strong bundling candidate.
  • Promotion-acquired customers had lower CLV despite higher initial order values.

Impact

Reusable query library — drop in any transactional dataset and the CLV and cohort analyses run with minimal modification.

ToolsPostgreSQL, DBeaver
DatasetSynthetic e-commerce schema
TechniquesCTEs, window functions, cohort analysis, market basket / lift
Timeline2 weeks
StatusComplete
Excel Power Query

Budget vs. Actuals Tracker

Download File →

The Problem

Finance teams in mid-size organizations often track budget vs. actuals in fragmented, manually updated spreadsheets that are slow to reconcile and easy to break. This project built a clean, dynamic Excel model that automates variance calculations and surfaces issues at a glance.

Approach

Strict separation of data input, calculation, and presentation layers. Power Query for ingestion; structured tables with named ranges feed a variance calculation sheet; conditional formatting flags over-budget items automatically in the dashboard layer.

Key Features

  • Automated variance flags — no manual highlighting.
  • Dynamic pivot summary with department and period slicers.
  • Rolling year-end forecast from actuals entered to date.

Impact

Demonstrates that well-structured Excel is still a powerful analytical tool in environments where BI platforms aren't available — and that the model is maintainable by anyone, not just its original builder.

ToolsExcel, Power Query, Pivot Tables
TechniquesNamed ranges, dynamic arrays, conditional formatting, rolling forecasts
Timeline1 week
StatusComplete

Nick Celotto

© 2026 · nickcelotto.com

LinkedIn GitHub Tableau