Transparency

Our Methodology

How Water Utility Report sources, normalizes, interprets, and publishes U.S. drinking water data — and where we draw hard lines on what we will and won't do.

Core Philosophy

Water Utility Report is built on a simple premise: the most useful thing we can do is take hard-to-navigate official public data and make it genuinely understandable.

We do not manufacture water quality claims. We do not republish third-party commercial databases. We do not publish pages that exist only to capture search traffic. Every page that goes live must answer a real user question with real, source-backed information.

What We Use

Stage 1 uses only official U.S. government datasets and public records where terms clearly allow normalization, summarization, and republication of derived facts.

EPA SDWIS (Safe Drinking Water Information System)

Core utility identity, system IDs, violation records, population served

EPA ECHO (Enforcement and Compliance History Online)

Compliance history, enforcement actions, detailed violation data

Consumer Confidence Reports (CCRs)

Annual utility-published water quality reports; source of contaminant level data

EPA Water Quality Portal

Supporting sampling and monitoring data from federal and state agencies

State drinking water program datasets

Where terms permit public use; used for service area and utility detail

EPA and CDC public guidance documents

Health-effect interpretations and treatment guidance references

U.S. Census Bureau

Population and geography data for service area normalization

What We Won't Use

Stage 1 explicitly prohibits the following sources unless written permission or a license is obtained.

WQA Member Directory

Licensed commercial data — requires explicit authorization

WQA / NSF Certified Product Datasets

Commercial certification databases — bulk reproduction prohibited

EWG Tap Water Database content

Nonprofit competitive database — bulk extraction not permitted

Competitor or third-party directories

No bulk scraping, copying, or republication of third-party databases

Logos, seals, and certification marks

Third-party trust marks — not reproduced without explicit license

How Pages Are Built

No page goes live from an automated pipeline directly to public. Every page goes through a review and publish workflow with human checkpoints.

01

Data ingested

Official datasets downloaded from EPA or state sources. Source URL, ingestion date, and dataset version recorded at row level.

02

Records normalized

Utility names, system IDs, geographic references, and contaminant names standardized. Duplicate and incomplete records filtered.

03

Draft page generated

Page templates populated from normalized data. AI-assisted plain-English summaries drafted for human review.

04

Human review

Reviewer checks factual accuracy, legal compliance flags, internal link logic, and content quality. No page is published directly from automated generation.

05

Approved and published

Page assigned publish status. Cohort controls allow staged rollout by state, city, or entity group.

06

Refresh cycle

Annual ingestion refresh. Utilities that publish new CCRs trigger re-review of affected pages.

Confidence Levels

Every data record in our system carries a confidence score. This is shown on pages where confidence is less than high, so readers understand the certainty level of the underlying data.

High Confidence

Data sourced directly from EPA SDWIS, ECHO, or utility CCRs with verified ingestion date. Utility identity confirmed against official system ID.

Medium Confidence

Data sourced from state datasets or derived from official data through normalization steps. Core facts verified; some derived fields may carry uncertainty.

Low Confidence

Data modeled, inferred, or sourced from a third party that has not been fully verified. Flagged for review before publication. Pages with low-confidence data carry explicit warnings.

Legal Safeguards

Likely match disclosure

Utility-to-address matching is disclosed as 'likely' where service area mapping relies on modeled boundaries. We never claim certainty we don't have.

Regulatory vs. health interpretation

We clearly separate what regulatory data shows from what health guidance recommends. These are often different — we do not conflate them.

No medical claims

Water quality information is informational only. We do not make medical, diagnostic, or treatment recommendations beyond linking to official health guidance.

Source-first

Every factual claim links to or identifies its source. Data without a source attribution is not published on entity pages.

Data Provenance Standards

Every ingested record in our system stores the following provenance fields at the row level:

FieldPurpose
source_typeIdentifies whether data is official, state, derived, or modeled
source_urlDirect URL to the source document or dataset
ingestion_dateDate this record was pulled from the source
last_verification_dateLast date a human or automated check confirmed the record
transform_versionWhich version of our normalization pipeline processed this record
confidence_scoreNumerical confidence score for derived or inferred values

Site-Wide Disclaimer

Water Utility Report provides informational content derived from official U.S. government and public datasets. All content is published for informational purposes only.

  • This site is not a substitute for professional water testing by a certified laboratory.
  • Utility service area matching is likely but not guaranteed for all addresses — confirm with your utility or water bill.
  • Where data is modeled or derived, this is disclosed on the relevant page.
  • This site is not a substitute for medical advice. Consult a healthcare provider for health concerns related to water quality.
  • Contaminant data reflects the most recent Consumer Confidence Report or regulatory data available to us, which may not reflect real-time conditions.

More Methodology Pages