CUSP Versioning and Exports Plan¶
Goal¶
Make every official CUSP data release:
- citable
- reproducible
- easy to find as the current
latest - easy to recover later as an archived historical snapshot
Current Release Model¶
CUSP now separates:
- the canonical released dataset
- optional released observation-level feature tables
- non-release workflows such as aggregation
That means users should cite and share versioned release files such as
cusp_v1.0.csv, while the repository can still keep separate working files for
building and checking the release.
Version Format¶
Dataset releases use vX.Y.
Examples:
v1.0v1.1v2.0
This is intentionally simpler than a vX.Y.Z scheme.
Version-Bump Policy¶
Major bump¶
Use a major bump when the public contract changes in a breaking way.
Examples:
- canonical observation schema changes incompatibly
- official release bundle structure changes in a way users must adapt to
- the meaning of core fields changes incompatibly
Minor bump¶
Use a minor bump when official data content or official exported products change meaningfully without breaking the public contract.
Examples:
- a new source is added to the canonical release
- an existing source is removed or deferred from the official release
- source-processing fixes change rows in
cusp_vX.Y.csv - the official observation-level feature table is regenerated with materially changed content
- release citation coverage changes because the included source set changed
Official Export Layout¶
Use a real export tree inside the repo workspace:
exports/
latest/
cusp_v1.0.csv
cusp_features_v1.0.csv
cusp_sources_v1.0.bib
RELEASE_INFO.md
archived/
v1.0/
cusp_v1.0.csv
cusp_features_v1.0.csv
cusp_sources_v1.0.bib
RELEASE_INFO.md
v1.1/
...
Notes:
- the export bundle is intentionally flat
cusp_features_vX.Y.csvis included only when an observation-level feature table keyed tocusp_obs_idis provided- aggregation outputs are not part of the official versioned export bundle
Official Exported Files¶
The core exported filenames are:
cusp_vX.Y.csvcusp_features_vX.Y.csvcusp_sources_vX.Y.bibRELEASE_INFO.md
cusp_vX.Y.csv¶
This is the canonical public CUSP dataset:
- all accepted processed sources
- integrated into the CUSP release schema
- deduplicated
- QA/QC checked
In repository rebuilds, this file is exported from the working observation
table produced by python -m cusp.build.
cusp_features_vX.Y.csv¶
This is an optional official release artifact when present.
Rules:
- it must be keyed to
cusp_obs_id - it must align exactly to the canonical observation release
- aggregation-keyed feature tables are not valid official release artifacts
Internally, this should be produced by sampling features against the main CUSP observation table, not against a spatial summary.
cusp_sources_vX.Y.bib¶
This is the master bibliography file for the specific sources included in the release.
It is a filtered subset of the repo’s master data/cusp_sources.bib, not a
copy of every possible source ever considered.
RELEASE_INFO.md¶
This is the human-readable release record for the bundle.
It should include:
- dataset version
- code version
- git commit
- release date / generation time
- row count
- source count
- date range
- exported artifact list
- checksums
- a short “changes in this release” section
Citation Model¶
The public citation model is now intentionally simple:
- export one BibTeX file:
cusp_sources_vX.Y.bib - use source keys in the data table as BibTeX entry keys
- provide a helper command to extract only the needed entries from any filtered CUSP table
Supported helper:
python -m cusp.citations --input path/to/your_cusp_table.csv --output references.bib
This works with tables that contain either:
sourceaggregated_sources
Aggregation Status¶
The aggregation workflow remains important, but it is currently a reproducible example workflow rather than an official versioned release artifact.
That means:
python -m cusp.aggregateremains availableaggregated_30m.csvremains useful and documented- aggregation outputs do not need to be rebuilt and archived for every CUSP dataset version unless the team later promotes them back into the official release bundle
Recommended Release Workflow¶
- Rebuild the canonical dataset with
python -m cusp.build. - If needed, generate an observation-level feature table keyed to
cusp_obs_id. - Decide the next dataset version, for example
v1.0orv1.1. - Run the scripted release gate, including strict docs validation, with
python -m cusp.release_gate --version 1.0 --gee-project <your-earth-engine-project>. For CI or environments without Earth Engine credentials, use--skip-gee-smokeand treat the live GEE smoke as a manual release check. - Package the official bundle with
python -m cusp.export. - Review
RELEASE_INFO.md. - Publish the archived bundle and refresh
exports/latest/.
The release gate writes test exports and aggregation outputs under
runs/release_gate/. Those files validate the workflow but are not official
release artifacts.
Practical Recommendation Right Now¶
For the first public release, the official bundle uses:
cusp_v1.0.csvcusp_features_v1.0.csvcusp_sources_v1.0.bibRELEASE_INFO.md
The feature table is included because the full base_v1 observation-level
feature export has been sampled against the canonical dataset and aligns to
cusp_obs_id.