Test-Driven Regulation Development

Why Tests Are Non-Negotiable in Payroll

Payroll has a property that most software domains don't: a value that is wrong but consistently applied passes all manual reviews. If your health insurance contribution calculation uses 7.3% instead of the correct 7.65%, every employee's payslip will show a number that looks reasonable. The contribution will always be proportional to gross income. No employee will notice. No manual review will catch it. The error only surfaces months later during a statutory audit — or never, if the auditor uses the same incorrect rate.

This is why payroll testing must be automated, deterministic, and end-to-end. Not unit tests of isolated functions. Not mock-based tests that verify method calls. Full integration tests that start with employee case data, execute a complete payrun through the same engine that runs in production, and assert that specific wage types produce specific numeric results.

The test doesn't verify that "the contribution function was called." It verifies that wage type 6000 (Health Insurance Employer) produces exactly 382.50 EUR for an employee earning 5,000 EUR/month with the 2026 statutory rate. If the rate is wrong, the assertion fails. If the base calculation is wrong, the assertion fails. If a lookup table is missing, the assertion fails. There's no way to produce the correct number by accident.

Integration tests as compliance artifacts: Each test is a documented calculation scenario. The test file contains the inputs (employee data, statutory parameters), the expected outputs (specific wage type values), and a README with step-by-step arithmetic showing how those outputs are derived. An auditor can review the test, verify the arithmetic against the gazette, and confirm the system calculates correctly — without ever running the software.

The testing approach is deliberately not unit testing. A unit test for a wage type function would verify that the function multiplies two numbers correctly. But the real failure modes in payroll are system-level: a lookup table that wasn't imported, a data regulation with the wrong validFrom date, a case field that doesn't resolve because it's in an unreachable regulation layer, a time-segmentation collapse that produces a weighted average instead of per-day arithmetic. These failures only manifest when the full payrun engine is executing, resolving layers, loading lookups, evaluating case values in period context. Unit tests can't find them.

For a deeper look at how compliance testing fits into the broader architecture, see Test-Driven Compliance.

The Test Trinity: .et.json, .pecmd, README.md

Every test in a PayrollEx regulation follows the same three-file structure. No exceptions, no shortcuts. If any of the three files is missing, the test is incomplete.

Tests/
  WT-TC5100-DE-Lohnsteuer-StKl1/
    WT-TC5100-DE-Lohnsteuer-StKl1.et.json
    WT-TC5100-DE-Lohnsteuer-StKl1.pecmd
    README.md
  WT-TC6000-DE-KvAg-Standard/
    WT-TC6000-DE-KvAg-Standard.et.json
    WT-TC6000-DE-KvAg-Standard.pecmd
    README.md
  Test.All.pecmd
  README.md

Each file has a specific purpose in the compliance workflow:

The .et.json file is the exchange test JSON. It contains everything the engine needs to execute the test: employee case data, case field values with start and created dates, a payrun job invocation with an evaluation date, and expected payroll results with wage type numbers and expected values. This is the machine-readable test definition — the engine loads it, runs the payrun, and compares actual results against expected results.

The .pecmd file is the CLI command to run the test standalone. It contains a single PayrunEmployeeTest command that points to the .et.json file. A developer can run this file directly to execute one test in isolation — useful for debugging a specific failure without running the entire suite.

The README.md is the human-readable calculation record. It documents the purpose of the test, the scenario parameters, the expected results, and — critically — the step-by-step arithmetic showing how every expected value is derived from the inputs. This is not optional documentation. It IS the calculation record. Without it, every calculation must be reconstructed from scratch when someone needs to understand why a test expects a specific value.

The README is not documentation — it is the calculation record. A test that asserts wage type 6000 = 382.50 EUR is meaningless without the arithmetic: 5000 * 7.65% = 382.50. If the statutory rate changes to 7.85% next year, the README tells you exactly which step to update and what the new expected value should be. Without it, you're reverse-engineering the formula from a bare assertion.

The standardized README format ensures every test is documented consistently:

# WT-TC6000-DE-KvAg-Standard

## Purpose
Verifies employer health insurance contribution for a standard employee
with statutory health insurance at the 2026 general rate.

## Scenario
| Parameter | Value |
|---|---|
| Monthly salary | 5,000.00 EUR |
| Insurance type | GKV (statutory) |
| KV general rate | 14.6% |
| KV employer share | 7.3% |
| KV additional rate | 1.7% |

## Expected Results
| WT | Name | Value |
|---:|---|---:|
| 6000 | KvAg | 382.50 |

## Calculation
```
SV-Brutto: 5,000.00
KV employer = 5,000 * (7.3% + 0.85%) = 5,000 * 8.15% = 407.50
Wait — employer pays half of additional: 1.7% / 2 = 0.85%
Total employer: 5,000 * (7.3% + 0.85%) = 407.50
```

## Run
```
WT-TC6000-DE-KvAg-Standard.pecmd
```

Building Test Data

The .et.json structure follows the Payroll Engine exchange format. Here's a minimal but complete test file showing the critical elements:

{
  "createdObjectDate": "2025-12-01T00:00:00.0Z",
  "employees": [
    {
      "identifier": "TEST-EMP-001",
      "firstName": "Test",
      "lastName": "Employee",
      "cases": [
        {
          "caseName": "DE.Employment",
          "caseFieldValues": [
            {
              "caseFieldName": "DE.EmploymentStart",
              "value": "2025-01-01",
              "start": "2025-01-01T00:00:00.0Z",
              "created": "2025-12-15T00:00:00.0Z"
            }
          ]
        },
        {
          "caseName": "DE.Gehalt",
          "caseFieldValues": [
            {
              "caseFieldName": "DE.MonatGehalt",
              "value": "5000",
              "start": "2026-01-01T00:00:00.0Z",
              "created": "2025-12-15T00:00:00.0Z"
            }
          ]
        }
      ],
      "payrunJobInvocations": [
        {
          "name": "TestRun",
          "payrunJobStatus": "Complete",
          "evaluationDate": "2026-01-31T00:00:00.0Z"
        }
      ],
      "payrollResults": [
        {
          "wageTypeNumber": 6000,
          "value": 382.50
        }
      ]
    }
  ]
}

Several rules govern this structure, and violating any one produces failures that are difficult to diagnose:

createdObjectDate at root level. This sets the baseline creation timestamp for all regulation objects in the test. It must be earlier than any evaluationDate in the payrun job invocations. Set it to a date well before the test period — typically the December before the test year.

created on every caseFieldValue. The root-level createdObjectDate does NOT propagate to case field values. Every caseFieldValue must carry its own explicit created timestamp. Missing it produces duplicate entry errors or — worse — silently invisible case values.

created < evaluationDate (strict inequality). A case value with created equal to evaluationDate is not visible to the payrun. The value must be created strictly before the evaluation date. This catches a common mistake: setting both to the same January 1st date and wondering why the salary isn't picked up.

caseName always fully qualified. Case names in test data must include the namespace prefix: "DE.Employment", not "Employment". The engine stores objects with the prefix and searches by exact name during import. An unqualified name produces "Unknown case field" errors.

Rule	Correct	Wrong	Failure mode
`created` on caseFieldValue	`"created": "2025-12-15T..."`	Omitted (relying on root date)	Duplicate entry or invisible value
Strict inequality	`created: Dec 15`, `eval: Jan 31`	`created: Jan 31`, `eval: Jan 31`	Value not visible — zero result
Fully qualified caseName	`"DE.Employment"`	`"Employment"`	"Unknown case field" error
`createdObjectDate` at root	Before earliest evaluation	After or equal to evaluation	Regulation objects not resolved

Testing Provider Overlays

A provider regulation overlays the country regulation — adding company-specific wage types, overriding rates, or extending the case model. Testing an overlay requires the full layer stack to be visible to the payrun engine. This visibility is controlled by Test.Setup.json.

The setup file defines which regulations the payroll can access. A data satellite that's imported via the test pipeline but not listed in the payroll's layers array is invisible — GetLookup returns null, GetCaseValue returns default, and wage types silently produce empty results with no error message.

{
  "payrolls": [
    {
      "name": "TestPayroll",
      "layers": [
        {
          "level": 1,
          "regulationName": "DE.Entgeltabrechnung"
        },
        {
          "level": 2,
          "regulationName": "DE.Data.SV.2026"
        },
        {
          "level": 3,
          "regulationName": "DE.Data.LSt.2026"
        },
        {
          "level": 4,
          "regulationName": "ACME.Provider"
        }
      ]
    }
  ]
}

The layer order matters. Higher levels override lower levels. The country regulation sits at level 1, data satellites at levels 2-3, and the provider overlay at level 4 (the highest). A wage type defined in both the country regulation (L1) and the provider overlay (L4) resolves to the provider's version. A lookup defined only in a data satellite (L2) is visible to all higher layers.

The silent null problem: When a Custom Action calls GetLookup<T>("ContributionParameter", PeriodStart.Year) and the data satellite containing that lookup isn't in the layers array, the result is null. If the action doesn't null-check (many don't during initial development), you get a NullReferenceException buried in the payrun log. If it does null-check and returns ActionValue.None, the wage type produces no result — which might not trigger an assertion failure if the test doesn't assert that specific wage type. Layer misconfiguration is the number one cause of "tests pass locally but the payslip is empty" bugs.

The test pipeline must import all regulations AND configure the layer stack:

# Setup.Test.2026.pecmd — imports all regulation files
TenantImport DE.Test.Setup.json
RegulationImport Regulation/DE.Entgeltabrechnung.json
RegulationImport Data/DE.Data.SV.2026.json
RegulationImport Data/DE.Data.LSt.2026.json
RegulationImport Provider/ACME.Provider.json

Both files must stay in sync. Adding a new data satellite means updating Setup.Test.2026.pecmd (so the data gets imported) AND updating Test.Setup.json (so the payroll can access it through the layer stack). Miss either one and the test passes locally (where the regulation might already be loaded from a previous run) but fails in CI (clean environment).

Common Pitfalls

Every regulation developer hits the same failure patterns. They're not bugs in the engine — they're consequences of the exchange format's strict rules and the engine's silent-failure design philosophy (which prioritizes payrun completion over diagnostic errors).

Pitfall	Symptom	Root cause	Fix
Missing namespace prefix on caseName	"Unknown case field" error	`"Employment"` instead of `"DE.Employment"`	Always fully qualify case names in test data
Missing `created` on caseFieldValue	Duplicate entry error on import	Two values with same implicit timestamp	Add explicit `created` to every caseFieldValue
`createdObjectDate` = `evaluationDate`	Case values not visible — zero result	Strict inequality violated (not visible same day)	Set `createdObjectDate` to prior month/year
Wrong TestPrecision	Test fails on correct value (rounding)	Default is 2 decimals; value has 4	Use `/TestPrecision4` flag for higher precision
Missing layer in Test.Setup.json	Silent null lookups — empty wage type results	Data satellite imported but not in payroll layers	Add regulation to layers array with correct level
`end` date set to successor's `start`	Overlapping periods — double calculation	`end` is inclusive, not exclusive	Set `end` to day before successor's start
Test passes locally, fails in CI	Missing regulation objects in clean environment	Local state from previous test runs persists	Verify Setup.Test.pecmd imports everything

The most insidious pitfall is the last one: a test that passes locally but fails in CI. This happens because the local development database retains state from previous test runs. A regulation that was imported last week is still loaded — so lookups resolve, case fields exist, and wage types produce results. But in CI, the environment starts clean. Only what Setup.Test.pecmd explicitly imports is available. If a new data satellite was added to the regulation but not to the test pipeline, local tests pass (stale data) and CI tests fail (clean slate).

The precision trap: Test assertions compare actual results against expected values with a configurable precision. The default is /TestPrecision2 — two decimal places. This means 382.504 rounds to 382.50 and passes an assertion of 382.50. But if your wage type produces 382.506 (which rounds to 382.51 at display time), the test still passes because the raw value rounds to 382.50 at 2-decimal precision. For wage types where intermediate rounding matters, use /TestPrecision4 or higher to catch sub-cent discrepancies.

Running Tests: Standalone and CI

Tests can be run individually or as a complete suite. The two modes serve different purposes in the development workflow.

Single test (standalone): Run one test in isolation to verify a specific calculation or debug a failure. Each test folder contains its own .pecmd file with a single PayrunEmployeeTest command:

# WT-TC6000-DE-KvAg-Standard.pecmd
PayrunEmployeeTest WT-TC6000-DE-KvAg-Standard.et.json /TestPrecision2

This runs the test against the currently loaded regulation state. It's fast, focused, and gives immediate feedback while developing a new wage type or fixing a broken calculation.

Full suite: Test.All.pecmd runs every test in sequence. This is the compliance gate — if any single test fails, the suite fails. The file is a flat list of PayrunEmployeeTest commands, one per line:

# Test.All.pecmd
PayrunEmployeeTest Tests/WT-TC1000-DE-Gehalt-Standard/WT-TC1000-DE-Gehalt-Standard.et.json /TestPrecision2
PayrunEmployeeTest Tests/WT-TC5100-DE-Lohnsteuer-StKl1/WT-TC5100-DE-Lohnsteuer-StKl1.et.json /TestPrecision2
PayrunEmployeeTest Tests/WT-TC6000-DE-KvAg-Standard/WT-TC6000-DE-KvAg-Standard.et.json /TestPrecision2
# ... one line per test ...

CI integration: In continuous integration, tests run as preview jobs — synchronous, no persistence, no retro. The CI pipeline executes the full setup (import all regulations and data satellites), then runs Test.All.pecmd. The same engine, the same execution path, the same payrun logic as production. The only difference is the job type: preview mode means results aren't persisted to the database after the assertion check.

Preview jobs are the correct choice for CI because they're synchronous (the pipeline can wait for completion), they don't leave state behind (each CI run starts clean), and they don't trigger retro-correction logic (which would fail because there are no prior periods in a test environment).

Context	Command	Job type	Persistence
Local development	Single `.pecmd`	Preview	No — results discarded after assertion
Full regression	`Test.All.pecmd`	Preview	No — clean each run
CI pipeline	Setup + `Test.All.pecmd`	Preview	No — ephemeral environment

Adding a new test: When you create a new test folder with its three mandatory files, two additional updates are required:

1. Test.All.pecmd — add a new PayrunEmployeeTest line
2. Tests/README.md — update the test count and add the row

The Tests/README.md at the root of the test folder maintains a table of all tests with their purpose. This serves as the test catalog — a quick reference for which scenarios are covered and which aren't. Keeping the row count accurate matters for compliance audits, where auditors want to know how many calculation scenarios are verified.

The completeness checklist: A test isn't done until all five items are checked off: (1) .et.json created with correct data, (2) .pecmd created with the right flags, (3) README.md with full calculation arithmetic, (4) line added to Test.All.pecmd, (5) row count updated in Tests/README.md. Skip any one and the test is incomplete — it either won't run in CI, won't be discoverable, or won't be auditable.

Ship with proof

See how test-driven regulation development ensures your overlay produces correct payslips from day one.

Get in Touch →