Modelling Generated 2026-05-14

Data dictionary & business glossary

Programmatically extracted from PDTF Schemas v3 (the base transaction + 18 main overlays — extension overlays not yet included) and merged with the two OPDA Glossary spreadsheets. The first deliverable of the semantic-modelling workstream.

Headline numbers (after audit / deduplication)
  • 16 canonical schemas (5 redundant ones excluded — see below)
  • 8,458 property-path entries across canonical schemas
  • 1,557 unique leaf property names — the actual PDTF vocabulary
  • 389 cross-context concepts (appear in 3+ schemas — need ontology reconciliation)
  • 754 context-specific concepts (appear in 1 schema only)
  • 54 business terms in the merged OPDA Glossary
  • 554 concepts in the generated SKOS scheme

Browse all properties

All 8,458 property-path entries from the canonical schemas. Search by name or path, filter by bounded context, source schema, or JSON type. Click any column header to sort.

Loading 8,458 properties…

Tip: try tenure, EPC or address in the search box; or pin a single overlay (e.g. ta6) with the source filter to see exactly what it adds.

Deliverables

All files generated to source/00-deliverables/semantic-models/.

FileWhat it isAudience
data-dictionary-canonical.json (≈ 1.8 MB) Machine-readable: 8,458 property-path entries across the 16 canonical schemas. Use this for headline counts. Engineers, code generators
data-dictionary.json (3.0 MB) Original extraction including derived/superseded schemas (14,287 entries). Kept for transparency — do not use for counts. Audit only
data-dictionary.md (≈ 110 KB) Human-readable: per-schema tables, deduplicated to unique leaf names, organised by bounded context. Engineers, BAs, reviewers
audit.json (≈ 2 KB) What the inflation was, why, and the corrected figures. Read this to understand the count discrepancy. Anyone reviewing numbers
glossary-merged.json (20 KB) Merged of the two OPDA Glossary.xlsx files (PoC + Technical WG), deduplicated to 54 unique terms. Engineers, modelling
business-glossary.md (23 KB) Three-section human glossary: OPDA Glossary terms A–Z, top-level PDTF concepts from schema annotations, external vocabulary (W3C VC, DID, ToIP). Stakeholders, working groups, new joiners
business-glossary.ttl (164 KB) SKOS Concept Scheme — 554 concepts. Loads directly into Protégé / GraphDB / TopBraid for the ontology work. Ontology engineers

Canonical schema inventory

SchemaBounded contextUnique leaves
pdtf-transaction.jsonBase — spans all contexts1,557
baspi5.jsonEstate Agency318
rds.jsonProperty Data Services196
piq.jsonSurveying184
ta6.jsonConveyancing178
nts2.jsonEstate Agency160
lpe1.jsonConveyancing136
con29R.jsonProperty Data Services125
ntsl2.jsonEstate Agency124
ta7.jsonConveyancing98
ta10.jsonConveyancing90
fme1.jsonMortgage Lending78
oc1.jsonProperty Data Services68
con29DW.jsonProperty Data Services34
sr24.jsonProperty Data Services7
llc1.jsonProperty Data Services3

Excluded as redundant: combined.json (derived merge), skeleton.json (template), baspi4.json (superseded by baspi5), nts.json (superseded by nts2), ntsl.json (superseded by ntsl2).

How it was built

flowchart LR classDef src fill:#eef4f8,stroke:#1a4d80,color:#0b2545; classDef proc fill:#fef3c7,stroke:#b45309,color:#7c2d12; classDef out fill:#dcfce7,stroke:#166534,color:#14532d; S1[v3 base schema
pdtf-transaction.json]:::src S2[18 v3 overlays
BASPI · TA6/7/10 · NTS
CON29R/DW · PIQ · OC1 · LLC1
LPE1 · FME1 · RDS]:::src G1[Glossary.xlsx
Trust Framework PoC]:::src G2[Glossary.xlsx
Technical Working Group]:::src EXT[W3C VCDM 2.0
W3C DID 1.0
ToIP Foundation]:::src PROC[Property walker
+ glossary merger
+ SKOS generator]:::proc D1[data-dictionary.json
14,287 entries]:::out D2[data-dictionary.md
per-source tables]:::out D3[glossary-merged.json
54 terms deduplicated]:::out D4[business-glossary.md
tri-source glossary]:::out D5[business-glossary.ttl
SKOS · 554 concepts]:::out S1 --> PROC S2 --> PROC G1 --> PROC G2 --> PROC EXT --> PROC PROC --> D1 PROC --> D2 PROC --> D3 PROC --> D4 PROC --> D5
Build pipeline. Deterministic — re-runnable any time the schemas or glossaries change.

Source citations

Every entry in data-dictionary.json carries the source field naming the schema file it was extracted from. Every entry in glossary-merged.json carries a sources array naming which of the two Glossary.xlsx files contributed.

The primary sources, with paths:

Schema coverage

The 21 v3 root-level + main-overlay JSON files the extractor reads, and the bounded context each one serves. (The 16 NTS2 extension overlays under overlays/extensions/ are documented separately on the PDTF overlays page and not yet processed by this generator.)

SchemaBounded contextWhat it adds to the base
pdtf-transaction.jsonBase (all contexts)The core transaction object — participants, property, claims, lifecycle
combined.jsonCombined viewBase + all overlays merged for tooling
skeleton.jsonBootstrappingSkeleton structure for a new empty transaction
baspi4.json / baspi5.jsonEstate AgencyBASPI v4 + v5 (HBSG)
nts.json / nts2.jsonEstate AgencyNTS Material Info Sales (v1 + v2 successor)
ntsl.json / ntsl2.jsonEstate AgencyNTS Material Info Lettings (v1 + v2)
ta6.jsonConveyancingLaw Society TA6 Property Information Form
ta7.jsonConveyancingLaw Society TA7 Leasehold Information Form
ta10.jsonConveyancingLaw Society TA10 Fittings & Contents Form
lpe1.jsonConveyancingLPE1 Leasehold Property Enquiry
fme1.jsonMortgage LendingForm for Mortgage Enquiries
piq.jsonSurveyingProperty Information Questionnaire
rds.jsonProperty Data ServicesResidential Data Schema
con29R.jsonProperty Data ServicesCON29R Residential local-authority search
con29DW.jsonProperty Data ServicesCON29 Drainage & Water search
oc1.jsonProperty Data ServicesOC1 Office Copy entries (HMLR title)
llc1.jsonProperty Data ServicesLLC1 Local Land Charges search
sr24.jsonProperty Data Servicessr24 (small overlay)

The 313 cross-context concepts

313 property names appear in three or more overlays — these are the cross-context vocabulary that needs explicit reconciliation in the ontology (same word, possibly different meanings in different contexts). Top examples (sorted by spread):

address, name, date, amount, type, description, reference, status, provider, title, property, id, required, code, category

Full list with which overlays contain each: see the "Cross-context concepts" section of source/00-deliverables/semantic-models/business-glossary.md.

What to do with these deliverables

  1. Have an editorial pass over business-glossary.md with the Technical Working Group — reconcile the OPDA Glossary terms (which are open-banking/trust-framework-flavoured) with the property-data terms extracted from the schemas. The mix today is asymmetric.
  2. Define the upper ontology — pick the core classes (Transaction, Property, Participant, Claim, Document, Form, Search) and align with W3C VC. Use business-glossary.ttl as the seed.
  3. Per-overlay JSON-LD contexts — for each of the 18 main overlays (and eventually the 16 extension overlays), author a @context that maps the JSON property names to ontology terms. Start with baspi5.json as the worked example.
  4. Generate SHACL shapes from the JSON Schema validation rules. Most rules (required, enum, pattern, type) translate mechanically.
  5. Disambiguate the 313 cross-context names — for each, decide whether they refer to the same concept (single ontology class) or to context-specific concepts (separate classes with mapping).