Cloud-native GRIB cropper — crop before you download
GFS today · HRRR, NAM, ECMWF open-data coming as the community grows
sharktopus is an open-source Python library that crops GRIB2 weather data
in the cloud — by bounding box, variables, and vertical levels — before it hits
your disk. It deploys a small serverless wgrib2 worker to AWS Lambda, Google
Cloud Run, or Azure Container Apps; each user runs on their own cloud account and pays
their own (typically near-zero) bill.
Today it ships with the NOAA Global Forecast System (GFS 0.25°) end-to-end. The internals are deliberately product-agnostic — batch orchestration, byte-range streaming, cropping, inventory, quotas — so adding a new product (HRRR, NAM, RAP, ECMWF open-data) is a matter of plugging in a URL resolver and a catalog, not rewriting the core. See docs/ADDING_A_PRODUCT.md.
sharktopus is consumer-agnostic: the output is a valid cropped GRIB2 file. Typical use cases:
The typical win for a 72-hour regional domain: ~12 GB → ~200 MB of transfer, ~20 min → ~30 s wall time. Defaults ship WRF-canonical variable and level sets because that's the lineage — override with your own lists for any other consumer.
sharktopus is a Python library with a one-command CLI and an optional local web UI on top. Use whichever matches your workflow — they all produce the same cropped GRIB2 output.
Good for shell pipelines, cron jobs, Makefiles. Install, then run:
pip install sharktopus
sharktopus \
--start 2024012100 --end 2024012112 --step 6 \
--lat-s -25 --lat-n -20 --lon-w -45 --lon-e -40 \
--priority gcloud_crop aws_crop nomads_filter \
--dest ./my-run/
Good for notebooks, ML pipelines, Airflow/Prefect tasks, or anywhere you already have Python doing the rest of the work:
from sharktopus import fetch_batch
fetch_batch(
timestamps=["2024012100", "2024012106"],
bbox=(-25, -20, -45, -40), # lat_s, lat_n, lon_w, lon_e
variables=["TMP", "UGRD", "VGRD", "HGT"],
levels=["500 mb", "850 mb", "surface"],
priority=["gcloud_crop", "aws_crop", "nomads_filter"],
dest="./my-run/",
)
See the Python
API section of the README for xarray integration and batch-level
parallelism controls.
Don't want to write Python? Run sharktopus --ui and drive the whole thing from
a local control panel: submit jobs, monitor free-tier quota, manage credentials, browse
inventory. The UI binds to 127.0.0.1 only (no auth, no network exposure) so it's
safe on any machine you already log into; for remote use, SSH-tunnel the port.
pip install 'sharktopus[ui]'
sharktopus --ui
The Submit page is the full CLI on a form — product picker, calendar-driven cycle selection, Leaflet map for the bounding box, variable / level cascade, source priority, directory browser for output paths.
sharktopus was originally developed to support the CONVECT project — “Convective Systems Forecasting: Integrated Analysis of Numerical Modeling, Radar and Satellites” (“Previsão de sistemas convectivos: análise integrada da modelagem numérica, radar e satélites”, CNPq Extreme Events Call 15/2023), coordinated by Dr. Tânia Ocimoto Oda. CONVECT is executed at IEAPM (Instituto de Estudos do Mar Almirante Paulo Moreira, Brazilian Navy) with partner institutions UENF (Universidade Estadual do Norte Fluminense Darcy Ribeiro) and UFPR (Universidade Federal do Paraná). sharktopus itself is maintained as an independent open-source project. Governance is merit-based and documented in GOVERNANCE.md. Contributors retain their own institutional affiliation — see AUTHORS.md.
sharktopus is not a product of, endorsed by, or representing the Brazilian Navy, CNPq, IEAPM, UENF, or UFPR. Institutional acknowledgement and project funding context are not institutional ownership.
Issues and pull requests:
github.com/sharktopus-project/sharktopus/issues
Project email: sharktopus.convect@gmail.com