Pollution data for the world's oceans exists โ but it's scattered across a dozen agencies, in different formats, at different scales. Nobody had stitched it together into a single comparable score per water body.
This project does exactly that. Six public datasets, cleaned and merged, weighted by scientific relevance, and turned into an index you can actually compare โ and forecast.
30
Water bodies scored
6
Data sources merged
2100
Forecast horizon
The index covers major oceans split by hemisphere, regional seas (Mediterranean, Black Sea, Red Sea, South China Sea), and large lakes (Great Lakes, Caspian Sea, Lake Victoria, Lake Baikal). Every region gets a score from 0โ100 built from real measurements โ not estimates.
Data Sources
What goes into the score
Every dataset is publicly available and free to use. Where regional data was missing from a source, values were filled using published peer-reviewed literature โ with the source documented per region in the codebase.
Marine Microplastics
NOAA NCEI
22,530 measurements globally โ the most direct measure of plastic pollution in the water column
Weight: 30%
River Plastic Input
Our World in Data / Meijer et al.
Plastic entering the ocean via rivers, by country โ the primary land-to-ocean pollution pathway
Weight: 25%
World Port Index
NGIA / Kaggle
3,824 ports worldwide with size classifications โ proxy for industrial coastal pressure and shipping activity
Weight: 20%
Global Coastal Characteristics
Copernicus / Zenodo
Population within 10km of shore โ proxy for waste generation and runoff pressure
Each of the five components is normalised to a 0โ100 scale using min-max scaling across all 30 regions. The components are then combined using a weighted average:
Microplastic concentration
30%
River plastic input
25%
Port pressure score
20%
Coastal population (10km)
15%
Ocean pH deviation
10%
Forecasts (2030, 2040, 2050, 2100) apply a 3% annual compound growth rate โ consistent with observed global plastic production trends over the past two decades. Scores exceeding 100 in the 2100 forecast are intentional: they represent trajectories that exceed the current worst-case benchmark, making the unsustainability of current trends legible.
Missing data was handled in two ways: where published peer-reviewed measurements existed for a region, those values were used directly. Where no literature existed, values were imputed from comparable regions with documented reasoning. Every imputation is flagged in the source code.
Limitations:
Microplastic units vary across studies (items/mยณ, items/kg, items/kmยฒ) โ normalisation is approximate across different measurement methods
Copernicus biogeochemistry data is a single-day snapshot (June 2026), not a multi-year average
The 3% compound growth forecast assumes no policy intervention โ it is a trajectory, not a prediction
AIS shipping density data not yet integrated โ planned for a future version
The Builder
Who made this
S
Sercan Emiroglu
BSc Computer Science ยท City St George's, University of London
I'm a Computer Science graduate based in London with a focus on data science, machine learning, and building things that are actually useful. This project started as a portfolio piece and turned into something I genuinely care about โ there's real data here telling a real story about where the world's oceans are headed.
I'm currently looking for roles in data analytics and data science. If you're working on something interesting, I'd like to hear about it.