Formal QA Validation¶

This document records the current QA layer for the canonical observation-level release bundle and the latest audited state of that layer.

QA Workflow¶

Hard-gate tests live in tests/test_qc_observations.py. They validate the working observation table before it is exported as cusp_vX.Y.csv:

exact canonical observation columns only
present and unique cusp_obs_id
valid binary pf_observed
supported direct-observation method values
no missing or out-of-range coordinates
parseable and in-range dates
no negative depth values
no obs_limit == 0

Diagnostic audits are available through python -m cusp.qc audit-observations and the shared helpers in cusp/qc. The audit is intentionally behind-the-scenes: it writes review outputs under outputs/qc_audit/ and does not mutate data.

Current result¶

Latest run:

python -m cusp.qc validate-observations
python -m cusp.qc audit-observations
python -m unittest discover -s tests

Observed outcome:

all hard-gate tests passed
current canonical observation table size: 249,012 rows
current canonical observation table columns: 11
no hard-gate failures were written to outputs/qc_tests/

Current audit summary¶

From outputs/qc_audit/qc_summary.json:

n_missing_cusp_obs_id = 0
n_duplicate_cusp_obs_id = 0
n_date_unparseable = 0
n_date_future = 0
n_date_too_old = 0
n_missing_xy = 0
n_invalid_xy_range = 0
n_negative_pf_depth = 0
n_negative_thaw_depth = 0
n_negative_obs_limit = 0
n_zero_obs_limit = 0
n_invalid_pf_observed = 0
unsupported method rows: 0
n_thaw_gt_pf_diagnostic = 53
n_suspect_swapped_latlon = 0

Current pf_observed counts in the canonical observation table:

1: 230,539
0: 18,473

Explicit non-blockers¶

The following are intentionally not part of the hard-gate observation QA:

missing site_id
duplicate-heavy source semantics that remain source-level review topics
below-Arctic-circle checks

Those may still be reviewed manually or through source-specific triage, but they do not currently block the canonical release build.

Still deferred¶

These QA topics are still open for future refinement rather than implemented as formal blockers today:

ocean / impossible-location screening
stronger intended-domain checks
source-specific duplicate semantics beyond current build-level exact-duplicate handling