This model defines how to handle CVR coding details that are easy to misread in raw data.

Why this exists

AI and humans should not have to guess where code lists live or how code formats change over time.

This page defines one consistent policy for:

  • municipality codes (kommunekode)
  • branch coding transitions (DB07 DB25)
  • BS/NACE hierarchy handling
  • local codebook copies vs. source links

Operational Rules for Extraction

Apply these rules before joining, filtering, or aggregating CVR data:

  1. Normalize municipality codes to 4-digit strings before any match.
  2. Treat registrering* and virkning* as different time axes.
  3. Treat branch code systems as time-bound (DB07 and DB25 are not interchangeable).
  4. Treat CVRAdresse as text-based address components, not geometry.
  5. Do not geocode until address fields are normalized.
  6. For current-state extraction, filter registreringTil IS NULL and virkningTil IS NULL unless the question explicitly asks for history.

1) Municipality codes

Practical issue

The same municipality can appear as 101 or 0101.

Canonical rule

  • Canonical storage format: 4-digit zero-padded string
  • Example: Copenhagen = 0101

Query/interoperability rule

  • Input may be accepted as 101 or 0101
  • Normalize before joins and filtering
  • Publish normalized value in outputs

Failure mode to avoid

If one side of a join uses 101 and the other uses 0101, the join will silently fail or undercount. Always normalize first.

2) Branche coding over time (DB07 and DB25)

Practical issue

A firm can have historical records in DB07 and newer records in DB25 (from 2025 updates), which can look like code changes even when business meaning is stable.

Canonical rule

  • Treat branch code as time-bound classification
  • Do not merge DB07 and DB25 blindly
  • Always record:
    • code_system (DB07 or DB25)
    • code_value
    • time context (virkning/registrering scope)

Analysis rule

When combining periods across 2007-2024 and 2025+:

  1. keep native code system per record
  2. apply explicit mapping table between systems
  3. report mapping uncertainty where one-to-many or many-to-one mappings occur

Detection rule

  • Records from 2025 onward should be assumed to use DB25 unless source metadata says otherwise.
  • Historical records may still contain DB07 values.
  • Never infer code-system equivalence from identical text labels alone.

3) DB/NACE representation

Practical issue

Branche values are often represented as six digits without dot separators.

Canonical rule

  • Keep raw code as provided (6-digit string)
  • Derive hierarchy by prefix instead of punctuation:
    • level 1 = first 2 digits
    • level 2 = first 4 digits
    • level 3 = all 6 digits

This allows stable grouping even when display punctuation differs.

Query rule

  • If the user supplies dotted codes such as 56.11.10, strip punctuation before matching raw CVR values.
  • Preserve the raw stored code in output and optionally add derived hierarchical prefixes.

4) Current vs Historical filtering

Current-state rule

Use both:

  • registreringTil IS NULL
  • virkningTil IS NULL

Historical rule

For a target date t, filter by legal validity first:

  • virkningFra <= t
  • virkningTil IS NULL OR t < virkningTil

Then decide whether system-registration time also matters for the use case.

Failure mode to avoid

Using only registreringTil IS NULL returns what is currently stored, not necessarily what was legally valid at a historical date.

5) Address normalization for geocoding

Before matching CVRAdresse to an external address register:

  1. zero-pad kommunekode
  2. normalize vejnavn case and whitespace
  3. parse husnummerfra and bogstav into canonical house-number text
  4. preserve postnummer as string
  5. keep unmatched addresses in a separate exception set

Use this order of preference:

  1. kommunekode + vejnavn + husnummerfra + bogstav
  2. add postnummer when ambiguity remains
  3. fall back to formatted free text only as a last resort

Recommendation: Hybrid policy (required)

Use both:

  • Authoritative source links for provenance
  • Local frozen codebook snapshots for reproducibility

Why:

  • source links provide legal/official traceability
  • local snapshots prevent future source changes from silently altering historical analyses
  • AI can resolve lookups deterministically without web guessing

7) Minimum codebook package per project

Each project should maintain local snapshots in sanctuary/lookup assets with metadata:

  • municipality_codes.vYYYYMMDD.csv
  • branche_db07.vYYYYMMDD.csv
  • branche_db25.vYYYYMMDD.csv
  • optional db07_to_db25_crosswalk.vYYYYMMDD.csv

Each snapshot must include:

  • source URL
  • retrieval date
  • code system version
  • checksum/hash

8) Common failure cases

Failure caseWhy it happensPreventive rule
Empty municipality join101 vs 0101 mismatchzero-pad before joins
Wrong sector aggregationDB07 and DB25 mixed without mappingdetect code system per record
Missing geocodesraw CVRAdresse text not normalizednormalize address components before DAR join
False current snapshotonly one temporal axis filtereduse both registreringTil and virkningTil for current state
Wrong location semanticsVirksomhed used instead of Produktionsenheddecide legal entity vs operational site up front

9) AI execution checklist

Before filtering or joining:

  1. Normalize municipality codes to 4-digit strings
  2. Detect branch code system by time and metadata
  3. Resolve DB07/DB25 mapping policy
  4. Use local snapshot lookup tables
  5. Log all normalization and mapping steps in Design_Rationale

Machine-readable policy