CUSP Reproducibility And Inclusion Exceptions¶
Date: 2026-04-08 Status: Initial exception register for Phase 3
Purpose¶
This register tracks sources that are currently excluded, incomplete, or otherwise outside the clean reproducibility path for the canonical v1 observation-level release.
This is a working project document. Entries here should be revised as sources are clarified, rebuilt, or explicitly deferred.
How To Read This Register¶
Current status: how the source is treated in the repo todayReason: current rationale based on repo code, source scripts, and nearby documentationConfidence: how sure we are that the stated rationale is correctRelease implication: what this means for the v1 release planNext action: what needs to happen before the source can be considered resolved
Current Exceptions¶
Chen_2015¶
- Current status: removed from
data/after duplicate review; retained as a bibliographic-only source for synthesis traceability - Repo evidence:
- the removed processing script stated:
THIS DATASET IS INCLUDED IN THE SCHAEFER DATA, DO NOT INCLUDE data/cusp_sources.bibanddata/cusp_sources_bibtex.csvretain theChen_2015reference with a note that it should not be ingested separatelyJafarov_2016andMoore_et_al_2025now carry DOI metadata for the included synthesis/related ABoVE sources- Reason:
- this appears to be an intentional de-duplication decision rather than a reproducibility failure
- retaining the citation but removing the duplicate source directory avoids both duplicate observations and lost source provenance
- Confidence: high that it is intentionally duplicated; medium on the exact parent-source mapping
- Release implication:
- this should remain out of the canonical observation release as a separate source
- the reference should remain in the master bibliography for traceability when CUSP ingests a synthesis that may include its observations
- Next action:
- no source-directory action remains
- revisit only if CUSP adds a formal many-to-one source-provenance table for synthesis datasets
Beer_etal_2013¶
- Current status: excluded from the canonical observation-level combine step; considered resolved as out of scope for the observation release
- Repo evidence:
cusp/combine_data.pyexplicitly skipsBeer_etal_2013- the combine comment says it is interpolated map data with no dates
data/Beer_etal_2013/process_beer_etal_2013.pycreates rows withdate = np.nanand adds a comment that the data represent the period1960-1987- Reason:
- this is a gridded/interpolated map product rather than a dated observation dataset
- it does not fit the current observation-level schema requirement for a valid date field
- Confidence: high
- Release implication:
- should remain out of the canonical observation-level release
- could potentially be treated later as a distinct auxiliary modeled/map product, but not as a standard CUSP observation source
- Next action:
- keep excluded for v1 unless the project decides to support undated historical map products in a separate release track
- keep on the deferred deletion list, but do not delete yet
Brown_etal_2000_calm¶
- Current status: excluded from the canonical observation-level build after CALM supersession review
- Repo evidence:
data/Brown_etal_2000_calm/process_brown_etal_2000_calm.pyprocesses a manually reformatted CALM summary workbook- the parser depends on fixed row ranges in
CALM_Summary_table.xlsx - comparison against the newer
CALMGTN-P/PANGAEA export shows this source is CALM-derived and has source-level date/value alignment problems - Reason:
- the newer
CALMsource is the canonical CALM/GTN-P annual ALT export - keeping both would retain duplicate CALM-derived observations, while the Brown workbook path is less reproducible and less faithful to the source metadata
- Confidence: high that this should not remain as a separate canonical observation source once
CALMis included - Release implication:
- this source should stay excluded from the canonical observation-level release
- retain the source directory for now until the final deletion pass, because the project may still want the historical paper citation for provenance
- Next action:
- after
CALMis promoted into the release build, delete or archive the Brown processed source and keep any needed citation/provenance note outside the canonical source list
Sadeghi_etal_2023¶
- Current status: excluded from the canonical observation-level build pending source review
- Repo evidence:
data/Sadeghi_etal_2023/process_sadeghi_etal_2023.pydescribes the source as an InSAR-derived thaw-depth estimate product- the script assigns a representative date to a multi-year analysis window
- the processed source currently emits only an unsupported source-specific method label
- Reason:
- the canonical observation release is limited to direct observation workflows
- surface-displacement-derived thaw-depth products may be useful related data, but they are outside the current method vocabulary
- Confidence: high that the current processed output is not a direct field-observation table; source-level review may still clarify whether any directly observed validation data are present elsewhere in the source package
- Release implication:
- this source should remain out of the canonical observation release unless a direct-observation subset is identified and processed separately
- Next action:
- inspect the source package and notebook to determine whether any direct permafrost observations exist apart from the derived product
- if not, retain the source only as a related-data candidate outside the canonical observation table
Yi_etal_2020_ABoVE¶
- Current status: excluded, deferred, and not currently reproducible from the checked-in repo alone
- Repo evidence:
cusp/combine_data.pyexplicitly skipsYi_etal_2020_ABoVE- the combine comment says the source is too large to load directly and needs to be processed online
data/Yi_etal_2020_ABoVE/process_yi_etal_2020_above.pynow uses repo-relative paths and the canonical source key- the raw file
Alaska_active_layer_thickness_1km_2001-2015.nc4is now available locally, but it is gitignored and treated as an external input - the current netCDF would flatten to about
43,956,000time-grid rows if exported directly - Reason:
- the source still depends on an external/local raw input outside normal Git tracking
- the current flatten-to-CSV workflow is likely too large and needs redesign before this source is release-ready
- Confidence: high
- Release implication:
- this source is a true reproducibility exception for v1
- it should stay on the cleanup-later list unless we either:
- provide a documented external-download workflow, or
- host the required source data elsewhere and document access
- Next action:
- leave this on the investigate/cleanup-later list for now
- if later brought in scope, replace hardcoded paths with repo-relative logic and document the external-data acquisition step
Wilcox_2015¶
- Current status: excluded and incomplete for the current observation-level pipeline; needs later investigation
- Repo evidence:
cusp/combine_data.pyexplicitly skipsWilcox_2015- the combine comment says there are no lat/lon data for observations
- source files are present under
data/Wilcox_2015/, but there is no checked-inprocessed_wilcox_2015.csv - Reason:
- the current observation-level release requires geolocated records, and this source apparently does not meet that requirement in its current form
- Confidence: medium
- the combine comment is clear, but the repo still needs a fuller note describing whether coordinates are fundamentally absent or just not yet recoverable
- Release implication:
- this should remain excluded for v1 unless geolocation can be reconstructed in a scientifically defensible way
- Next action:
- keep on the investigate-later list
- add a short source note describing whether this is permanently non-geolocatable or just not yet processed
Running Lists¶
Revisit For Possible Inclusion¶
Sadeghi_etal_2023
Investigate Or Clean Up Later¶
Yi_etal_2020_ABoVEWilcox_2015
Deferred Deletion Candidates¶
These are not to be deleted now. Keep them skipped during ongoing development and revisit deletion near the end of release cleanup after documentation decisions are settled.
Beer_etal_2013Brown_etal_2000_calm
Additional Non-Blocking Cleanup Items¶
These are not currently exclusion reasons, but they should be cleaned up before deeper automation:
Yi_etal_2020_ABoVE: flattening the full netCDF directly to CSV is still too large for the canonical observation-level workflow
Recommended Immediate Decisions¶
- Keep
Chen_2015as a bibliographic-only duplicate/absorbed source unless CUSP adds formal sub-source provenance for synthesis datasets. - Treat
Yi_etal_2020_ABoVEas a formal reproducibility exception unless and until the external-data workflow and the oversized flattening workflow are redesigned and documented. - Leave
Beer_etal_2013,Brown_etal_2000_calm,Sadeghi_etal_2023, andWilcox_2015excluded for v1 unless the release scope changes.