Notebook 1 - Building the NUTS-3 DuckDB Data Lake

Published 4 April 2026

agri-weather-yield-drivers notebooks duckdb geospatial nuts3

This blog is where I write about energy and renewables analytics, operational reporting, forecasting-oriented analysis, geospatial workflows, and the practical side of building reproducible data systems.

Notebook 1 establishes the project backbone: a persistent DuckDB store with NUTS boundaries loaded and queryable with spatial functions.

Technical lane: Data Ingestion Business lane: Product & Delivery

Decision relevance.

This notebook removes geospatial ambiguity early. Once district geometry is stable, downstream coverage, feature engineering, and risk scoring can be compared on one consistent spatial frame.

Notebook role
Foundational ingest and geospatial normalization
Primary artifact
DuckDB-backed NUTS region tables
Granularity
NUTS-0 to NUTS-3 hierarchy for later joins
1

NUTS-3 DuckDB Data Lake

Load NUTS 0-3 polygons, validate geometry ingest, and prepare spatial joins for all downstream notebooks.

DuckDB spatialGeoParquetGISCO
Core notebook sequence completed: 17%

Key output

The notebook creates a reusable nuts_regions foundation table used throughout the project for point-in-polygon operations and district-level feature aggregation.

Practical takeaway: the same district geometries drive every later analytic step. This lowers reconciliation effort when comparing station coverage, weather features, and final risk scores.

LayerWhat is storedWhy it matters
Spatial boundariesNUTS polygons from level 0 to 3Keeps all later joins on one official administrative hierarchy
Reference keysRegion IDs and hierarchy linksEnables deterministic aggregation and roll-up checks
Geometry validation flagsBasic geometry sanity checksPrevents silent failures in downstream point-in-polygon operations

Open notebook source