Notebook 2.1 - Do the DWD Stations Actually Cover the Districts?

Notebook 2.1 asks the question I needed answered before doing weather feature engineering: where do the DWD stations actually land once I put them inside NUTS-3 districts?

The goal was to catch weak spatial coverage early, because station gaps are much cheaper to explain before they have been baked into a model output.

Technical lane: Data Evaluation Business lane: Product & Delivery

Validation intent.

Coverage checks happen before feature engineering so weak districts are identified early and can be handled explicitly in the assumptions, not discovered later as a mysterious model mood swing.

Data source: DWD CDC station metadata
Spatial join: Point-in-polygon assignment to NUTS-3 districts
Output class: District coverage quality labels and counts

2.1

Evaluate DWD Stations at NUTS-3

Spatially join station points to NUTS-3 polygons and quantify districts with zero or low station coverage.

DWD CDCPoint-in-polygonCoverage audit

View on GitHub →

Core notebook sequence completed: 33%

Key output

The notebook produces a district-level station coverage table and gives the station-based weather path a proper quality check. Sparse districts stay visible as named coverage issues.

If spatial support is weak, I want that uncertainty represented as data quality metadata that travels with the later model outputs.

- Quality gate principle

What I am watching

The important follow-up is how these coverage labels travel through the pipeline. They need to stay visible when weather features are aggregated and when final risk scores are interpreted.

Open notebook source