What Is a Data Dictionary? Example and Template
A data dictionary is a centralized reference that describes every field, table, and data element in a system: what each column means, its data type, allowed values, constraints, and how it relates to other data. Think of it as the manual for your database. When someone asks what users.status can hold or whether orders.total includes tax, the data dictionary answers without anyone reading the source code.
Most teams build one the hard way: a spreadsheet that drifts out of sync the moment the schema changes. A better approach treats the data dictionary as living database documentation that sits next to your other docs and stays searchable. This post defines the term, shows a real example with a sample schema, gives you a copy-paste template, and walks through creating one step by step.
What a data dictionary contains
A data dictionary documents each data element with a consistent set of attributes. The exact columns vary by team, but a useful dictionary almost always captures these:
- Field name: the column or attribute name, exactly as it appears in the schema (
user_id,created_at). - Data type:
integer,varchar(255),timestamp,boolean, and so on. - Description: a plain-language explanation of what the field holds and why it exists.
- Constraints: primary key, foreign key, not null, unique, default values.
- Allowed values: the valid set for enums or status fields (
active,suspended,deleted). - Relationships: which table and column a foreign key references.
- Source and owner: where the data originates and who maintains the definition.
The field name, data type, description, and constraints are the load-bearing four. Add allowed values and relationships once your schema has enums and foreign keys, which is to say almost immediately.
Data dictionary vs data catalog vs schema
These three terms get mixed up constantly. They describe different things at different altitudes. The table below sorts them out.
| Term | What it describes | Who uses it |
|---|---|---|
| Data dictionary | Detailed metadata for each field: type, description, constraints, allowed values | Developers, analysts, anyone querying the data |
| Data catalog | An inventory of all data assets across an organization, with discovery and lineage | Data teams managing many sources and pipelines |
| Database schema | The raw structure: tables, columns, keys, and relationships as defined in the database | Engineers building and migrating the database |
A schema tells you status is a varchar. A data dictionary tells you status means the account state and can only be active, suspended, or deleted. A data catalog tells you which of your forty data sources contains a status field at all. For a single product or app, a data dictionary is usually what you actually need.
A real data dictionary example
Here is a worked example for a small SaaS with two tables: users and orders. This is the kind of dictionary you would hand to a new engineer or analyst on their first day.
Table: users
| Field | Data type | Description | Constraints | Allowed values |
|---|---|---|---|---|
id | uuid | Unique identifier for the user | Primary key | Auto-generated |
email | varchar(255) | Login email address | Not null, unique | Valid email format |
display_name | varchar(100) | Name shown in the UI | Nullable | Any string |
status | varchar(20) | Account state | Not null, default active | active, suspended, deleted |
created_at | timestamp | When the account was created | Not null, default now() | UTC timestamp |
Table: orders
| Field | Data type | Description | Constraints | Allowed values |
|---|---|---|---|---|
id | uuid | Unique identifier for the order | Primary key | Auto-generated |
user_id | uuid | The user who placed the order | Foreign key → users.id | Existing user id |
total_cents | integer | Order total in cents, tax included | Not null, >= 0 | 0 to 2147483647 |
currency | char(3) | ISO 4217 currency code | Not null, default USD | USD, EUR, GBP |
status | varchar(20) | Fulfillment state | Not null, default pending | pending, paid, shipped, refunded |
created_at | timestamp | When the order was placed | Not null, default now() | UTC timestamp |
Notice what the dictionary adds that the schema alone does not. The schema would show total_cents is an integer. The dictionary tells you it is in cents and includes tax, which is exactly the detail that prevents a billing bug. The status rows spell out every value an enum can take, so nobody guesses.
Copy-paste data dictionary template
Use this Markdown template as a starting point. Duplicate the table for each entity, fill in your real fields, and keep it next to your other docs so it stays current.
## Table: <table_name>
Purpose: <one sentence on what this table stores>
Owner: <team or person responsible>
| Field | Data type | Description | Constraints | Allowed values |
| --- | --- | --- | --- | --- |
| field_1 | type | what it holds | PK / FK / not null / unique | enum or range |
| field_2 | type | what it holds | constraint | allowed values |
| field_3 | type | what it holds | constraint | allowed values |
If you prefer a spreadsheet to start, the same columns work as headers: Field, Data type, Description, Constraints, Allowed values, Relationship, Owner. The format matters less than keeping every field documented and the definitions accurate.
Why a data dictionary matters
A schema describes structure. It says nothing about meaning, intent, or the rules a human needs to use the data correctly. The data dictionary fills that gap, and the payoff shows up in a few concrete ways.
Onboarding speed. A new engineer or analyst can understand the data model without reverse-engineering it from queries or pinging teammates on Slack.
Fewer data bugs. When the allowed values for status are written down, nobody inserts cancelled into a column that only the app reads as refunded.
Better communication. Product, engineering, and analytics all reference the same definitions, so "active user" means one thing across the company.
Easier maintenance. When you migrate a column or deprecate a field, the dictionary is where the change gets recorded and explained, the same discipline behind any document control process.
This is the same reasoning behind good API reference documentation: the structure is only half the story, and the descriptions are what make it usable.
Active vs passive data dictionaries
Data dictionaries come in two types, and the difference is about how they stay in sync with the database.
An active data dictionary is tied to the database management system and updates automatically when the schema changes. Many relational databases maintain one internally, sometimes called the system catalog, exposed through views like information_schema. Because the DBMS owns it, an active data dictionary in a DBMS rarely drifts out of date.
A passive data dictionary is a separate document maintained by hand, a spreadsheet or a docs page. It is not connected to the database, so it only stays accurate if someone updates it after every migration. Passive dictionaries are more flexible and far more common for product documentation, but they are also where stale definitions creep in. The fix is process: update the dictionary in the same pull request that changes the schema.
How to create a data dictionary
You can build a usable data dictionary in an afternoon. Follow these five steps.
1. List your data elements
Pull the current schema and list every table and column. Most databases will dump this for you. In Postgres you can query information_schema.columns to get names, types, and nullability in one shot, which gives you the skeleton of the dictionary for free.
2. Write a description for every field
For each column, write one plain sentence explaining what it holds. Resist the urge to restate the name. created_at is not "the created at date," it is "the UTC timestamp when the record was first inserted." The descriptions are the entire reason the dictionary exists.
3. Record constraints and allowed values
Capture primary keys, foreign keys, not-null, unique, and defaults. For any enum or status field, list every allowed value. This step catches the most bugs, because undocumented allowed values are how invalid data gets written.
4. Map relationships
For each foreign key, note the table and column it references. This turns a flat list of columns into a model someone can actually reason about, the same way a REST API documentation page connects endpoints to the resources they return.
5. Publish and keep it current
Put the dictionary somewhere searchable, not buried in a wiki nobody opens. Then make updating it part of your schema-change workflow so it never drifts. A starting structure like a technical documentation template helps you keep formatting consistent as the dictionary grows.
Best practices and tools
A few habits separate a dictionary people trust from one they ignore:
- Keep it next to the schema change. Update definitions in the same PR that alters the table. This is the single most effective rule.
- One source of truth. Do not maintain three copies in three spreadsheets. Pick one home and link to it everywhere.
- Make it searchable. A dictionary nobody can search is a dictionary nobody uses. Full-text search across fields and tables turns it into a tool.
- Generate what you can. Pull field names, types, and nullability straight from
information_schemaso the boilerplate is never wrong, then hand-write the descriptions.
For tooling, options range from a plain Markdown file in your repo to dedicated data catalog platforms for large organizations. For a product team that wants a living, branded reference, tools like Docsio turn a data dictionary into a searchable published site instead of a stale spreadsheet. You can generate the doc site from your existing schema or docs, keep the dictionary alongside your API and setup guides, and publish it under your own domain so the whole team works from one current reference.
Data dictionary FAQ
What is a data dictionary?
A data dictionary is a centralized reference that describes every field and table in a system. For each data element it records the name, data type, description, constraints, allowed values, and relationships, giving everyone a shared understanding of what the data means and how to use it correctly.
What is an example of a data dictionary?
A common example documents a users table, listing each column such as id, email, and status alongside its data type, a plain description, constraints like primary key or not null, and allowed values. For status, the allowed values might be active, suspended, or deleted.
What is the difference between a data dictionary and a data catalog?
A data dictionary describes the detailed metadata of fields within a system, including types, descriptions, and constraints. A data catalog is broader: an organization-wide inventory of all data assets across many sources, focused on discovery, lineage, and governance rather than field-level detail.
How do you create a data dictionary?
List every table and column from your schema, write a plain description for each field, record its data type, constraints, and allowed values, and map foreign-key relationships. Then publish it somewhere searchable and update it in the same change that alters the schema so it never drifts.
Is a data dictionary the same as a database schema?
No. A schema defines the raw structure of a database: tables, columns, keys, and relationships. A data dictionary adds meaning on top of that structure, explaining what each field holds, its allowed values, and the rules a person needs to use the data correctly.
Bringing it together
A data dictionary is the difference between a database only its author understands and one any new teammate can use on day one. Start with the four load-bearing attributes, field name, data type, description, and constraints, add allowed values and relationships, and keep the whole thing updated alongside your schema. The format is simple. The discipline of keeping it current, and making it searchable rather than stranded in a spreadsheet, is what makes it pay off.
