How It's Built

A look at how roster PDFs become an interactive salary dashboard.

Tech Stack

Next.js + React
Frontend framework with API routes
Supabase
PostgreSQL with row-level security
Vercel
Hosting and deployment
CapWages API
NHL salary and contract data
Claude
PDF roster extraction
Python
Data pipeline and API integration

Architecture

IIHF Roster PDFs
iihf.com
Claude
PDF to CSV
Python + CapWages
Salary enrichment
Supabase
PostgreSQL
Next.js
Vercel

Data Pipeline

1

Roster extraction

The starting point is the IIHF website, which publishes roster PDFs for all 12 nations competing in the 2026 Olympic men's hockey tournament. Each PDF lists player names, positions, jersey numbers, and current club affiliations.

2

AI-assisted parsing

Rather than writing a custom PDF parser, I uploaded all 12 roster PDFs into Claude to extract the data into structured CSVs. This took care of the tricky parts in one pass: inconsistent table layouts, names with diacritics, and position formatting differences across countries.

3

Salary enrichment

A Python script reads the extracted CSVs and queries the CapWages API to pull each NHL player's 2025-26 contract details: cap hit, AAV, base salary, signing bonuses, trade clauses, and expiry status.

4

Database load

That same script loads everything into Supabase (PostgreSQL) across 7 normalized tables. Two database views handle the salary aggregation and roster joins on the backend, keeping the frontend queries simple.

Extending to Past Olympics

The database was designed with multiple Olympic years in mind from the start. Every player, roster entry, and contract season is scoped to a specific year, so adding a past Olympics is really just a matter of populating the same tables with new data.

The hard part is sourcing that data. The 2026 pipeline worked cleanly because the IIHF publishes current roster PDFs and CapWages has up-to-date contract information. For older Olympics, there's no single source that covers both rosters and salary data, so each year will likely need its own ingestion script to pull from different places.

Database Schema

olympics_years
id
year
host_city
start_date, end_date
countries
id
name
code
flag_emoji
nhl_teams
id
name
abbreviation
primary_color
olympic_rostersjunction
id
player_id → players
year_id → olympics_years
country_id → countries
nhl_team_id → nhl_teams
olympic_position
jersey_number, club_name, league
players
id
name
country_id → countries
position
date_of_birth, height, weight
capwages_slug, nhl_id
draft_year, draft_round, draft_pick
player_contracts
id
player_id → players
contract_type
total_value
is_current
expiry_status, signing_date
contract_seasons
id
contract_id → player_contracts
season
cap_hit
aav, base_salary, bonuses, clause
Primary key
Foreign key → target
Junction table