A data flow diagram (DFD) is a visual map of how data moves through a system. It shows where information enters, what processes change it, where it gets stored, and where it exits. Instead of describing a system in paragraphs, a DFD draws the inputs, outputs, processes, and storage as connected shapes, so a reader can trace one piece of data from start to finish.
DFDs sit inside the same family as a design doc template and an architecture decision record: artifacts that explain how a system behaves before or alongside the code. Where those documents capture decisions and trade-offs in prose, a data flow diagram answers a narrower question: what data goes where, and which process touches it on the way?
This guide covers the four DFD symbols, the three levels (context, level 1, level 2), a full worked example, the steps to draw one, how a DFD differs from a flowchart, and the tools teams use to build and publish them.
What is a data flow diagram?
A data flow diagram is a graphical model of the flow of data through a process or information system. It documents four things: the external entities that send or receive data, the processes that transform data, the data stores that hold it, and the data flows that connect them.
The key idea is that a DFD shows data movement, not control logic. There are no decisions, no loops, no "if this then that." A DFD never says when something runs or in what order. It only says: this data comes from here, this process changes it, the result goes there. That single focus is what keeps DFDs readable even for non-technical stakeholders.
DFDs come from structured systems analysis, developed in the 1970s by Larry Constantine, Tom DeMarco, and others. Decades later they remain a standard way to model systems during requirements gathering, security reviews, and architecture planning. They pair well with a software requirements specification, where the DFD shows the data side of what the spec describes in words.
Data flow diagram symbols
Every DFD uses four core symbols. Two notations exist for them: Yourdon and DeMarco (circles for processes) and Gane-Sarson (rounded rectangles for processes). The meaning is identical; only the shapes differ. Most modern tools default to Gane-Sarson because rectangles are easier to label.
| Symbol | Shape (Gane-Sarson) | What it represents | Example |
|---|---|---|---|
| Process | Rounded rectangle | An action that transforms incoming data into outgoing data | "Validate payment", "Generate invoice" |
| Data flow | Labeled arrow | A single packet of data moving between elements | "Order details", "Login credentials" |
| Data store | Open-ended rectangle or two parallel lines | Where data rests for later use | "Customer database", "Order table" |
| External entity | Square | A person or system outside the boundary that sends or receives data | "Customer", "Payment gateway" |
A few rules keep these symbols meaningful. Every process must have at least one input and one output, since a process that produces nothing or consumes nothing is a modeling error. Every data flow needs a noun label that names the data, not the action. Arrows point in the direction data travels, and a two-way exchange uses two separate arrows.
How the symbols connect
Data cannot flow between any two elements freely. The allowed connections are: external entity to process, process to external entity, process to data store, data store to process, and process to process. Everything else is invalid. Data never flows directly from one external entity to another, from an entity straight into a data store, or between two data stores. A process must always sit in the middle, because only a process can move or change data.
Levels of a data flow diagram
DFDs are layered. You start with one box for the whole system and progressively break it into more detail. Each level is numbered, and higher numbers mean more granularity. This decomposition is what lets a DFD scale from a one-glance overview to a working blueprint.
| Level | Name | What it shows | Audience |
|---|---|---|---|
| Level 0 | Context diagram | The entire system as one process plus its external entities | Executives, clients, anyone wanting the big picture |
| Level 1 | Top-level breakdown | The main subprocesses inside the system and the data stores they use | Analysts, product managers, lead engineers |
| Level 2 | Detailed breakdown | One Level 1 process exploded into its own sub-processes | Engineers building or reviewing a specific module |
Level 0: the context diagram
A level 0 DFD, also called a context diagram, is the simplest view. The whole system is a single process in the center. Around it sit the external entities, with data flows showing what enters and leaves. There are no data stores and no internal detail. The point is to define the system boundary: what is inside, what is outside, and what crosses the line.
Level 1: the main processes
A level 1 DFD takes that single process and splits it into its major functions, usually four to seven of them. Data stores appear here for the first time. Each process gets a number (1, 2, 3) so you can reference it. The external entities stay the same as level 0, which keeps the diagram consistent with the context view above it.
Level 2 and beyond
A level 2 DFD zooms into one level 1 process and breaks it down further. If process 3 was "Process payment," its level 2 diagram might show "Verify card," "Charge account," and "Record transaction." Sub-processes are numbered by parent (3.1, 3.2, 3.3). Most projects stop at level 2; only large systems need level 3 or deeper.
The rule that ties levels together is balancing: the data flows entering and leaving a process at one level must match the flows in its detailed diagram below. If "Order details" goes into process 2 at level 1, it must also appear going into the level 2 breakdown of process 2.
A worked DFD example: online store checkout
Consider the checkout flow of an online store. Here is how the same system looks across levels.
Level 0 (context): One process labeled "Checkout system" sits in the center. The external entity "Customer" sends a data flow called "Order and payment info" into it and receives "Order confirmation" back. A second external entity, "Payment gateway," receives "Charge request" and returns "Charge result." That is the whole context diagram: two entities, one system, four flows.
Level 1: The single process expands into four numbered processes. Process 1 "Validate cart" receives the order, checks item availability against the "Inventory" data store, and passes a verified order to process 2. Process 2 "Calculate total" computes the amount and forwards it to process 3 "Process payment," which talks to the Payment gateway entity and writes to the "Orders" data store. Process 4 "Send confirmation" reads from the Orders store and returns the confirmation to the Customer.
Level 2: Zooming into process 3 "Process payment," you get 3.1 "Tokenize card," 3.2 "Submit charge," and 3.3 "Record result." The data flow "Charge request" that left process 3 at level 1 now leaves 3.2 specifically, and the flows balance.
This example shows the value of layering. An executive reads the level 0 diagram in seconds. An engineer assigned to payments works from the level 2 view. Both look at the same system at the depth they need.
How to make a data flow diagram
Building a DFD follows a repeatable sequence. These steps work whether you are drawing on a whiteboard or in software.
- Define the system boundary. Decide what is inside the system and what is outside. Everything outside becomes an external entity. This step produces your context diagram.
- List the external entities. Identify every person, role, or external system that sends or receives data. Customers, admins, third-party APIs, and other systems all count.
- Identify the major processes. Name the main functions the system performs. Use a verb plus a noun, like "Validate order" or "Generate report." Aim for four to seven at level 1.
- Add data stores. Mark every place data rests: databases, files, queues, caches. Connect them only to processes.
- Draw the data flows. Connect entities, processes, and stores with labeled arrows. Each arrow names one packet of data with a noun. Check direction.
- Check the rules and balance. Confirm no two entities or stores connect directly, every process has input and output, and flows balance between levels.
- Decompose as needed. Break complex processes into their own level 2 diagrams. Stop when the detail is enough to build or review the system.
A DFD rarely lives alone. It belongs in your broader engineering documentation alongside the prose that explains why the system is built this way. A diagram showing where customer data flows, paired with the reasoning behind each store, is far more useful to a new engineer than either piece by itself.
DFD vs flowchart: what is the difference?
DFDs and flowcharts look similar but answer different questions. A DFD shows what data moves through a system. A flowchart shows the order of steps and decisions a process follows. Confusing the two is the most common DFD mistake.
| Aspect | Data flow diagram | Flowchart |
|---|---|---|
| Focus | Movement and transformation of data | Sequence of steps and control logic |
| Decisions | No decision points or branches | Diamond decision shapes, yes/no branches |
| Loops | None; no concept of repetition | Loops and iteration are common |
| Timing | Shows no order of execution | Shows step-by-step order |
| Data stores | Core element | Rarely shown |
| Best for | Modeling system data, security reviews | Modeling algorithms, workflows, procedures |
In short, if you need to show how data enters, changes, and exits a system, use a DFD. If you need to show the decision logic of a single procedure, use a flowchart. Many specs include both: a DFD for the system view and a flowchart for a tricky algorithm inside it.
DFD best practices
A clear DFD is the result of discipline, not artistry. These habits keep diagrams readable as systems grow.
- Name processes with verbs, flows with nouns. "Process order" is a process; "Order details" is a flow. Mixing them up makes diagrams ambiguous.
- Keep level 1 to four to seven processes. More than seven and the diagram becomes a wall of boxes. Group related functions and push detail to level 2.
- Number everything. Numbered processes (1, 2, 3 then 2.1, 2.2) let you reference them in the spec and trace them across levels.
- Avoid crossing lines where possible. Rearrange elements so flows do not tangle. A diagram readers cannot trace is worthless.
- Balance every level. Inputs and outputs at a parent process must match its child diagram. Unbalanced flows signal a modeling error.
- Keep it data-only. Resist adding decisions, timing, or hardware to a logical DFD. That belongs in a flowchart or a physical DFD.
Tools for building and publishing DFDs
You can draw a DFD with anything from a whiteboard to dedicated software. For drawing, diagram editors like draw.io, Lucidchart, and Figma all ship DFD symbol libraries with Gane-Sarson and Yourdon shapes. For version-controlled diagrams, text-based formats like Mermaid let you keep a DFD next to code in a repo.
The harder part is not drawing the diagram but publishing it where your team will find it. A DFD usually lives inside system or architecture documentation, and a diagram buried in a slide deck or a stale wiki page rarely gets read. Tools like Docsio turn architecture notes and diagrams into a branded, searchable docs site, so a DFD lands next to the spec and decisions it supports rather than in an attachment nobody opens. Docsio's AI documentation generation can draft the surrounding architecture pages from your existing notes, leaving you to drop the diagram into context.
However you publish it, treat the DFD as part of a living document set. Pair it with a technical documentation template so the diagram, the data definitions, and the design reasoning stay in one searchable place and get updated together.
Conclusion
A data flow diagram is one of the clearest ways to show how information moves through a system. Master the four symbols, the three levels, and the simple rule that a process must always sit between any two other elements, and you can model almost any system on one page. Keep the diagram data-focused, balance the levels, and publish it where your team actually reads documentation. Done well, a DFD turns a tangle of services and databases into a map anyone can follow.
Frequently asked questions
What is a data flow diagram?
A data flow diagram is a visual model showing how data moves through a system. It maps four things: external entities that send or receive data, processes that transform it, data stores that hold it, and the labeled flows connecting them. It shows data movement, not the order of steps.
What are the 4 symbols of a data flow diagram?
The four DFD symbols are the process (a rounded rectangle or circle that transforms data), the data flow (a labeled arrow showing data movement), the data store (parallel lines or an open rectangle for stored data), and the external entity (a square for people or systems outside the boundary).
What is the difference between a DFD and a flowchart?
A DFD shows how data moves and changes within a system, with no decisions or order of execution. A flowchart shows the step-by-step sequence and decision logic of a single process, including branches and loops. Use a DFD for system data, a flowchart for procedures.
What are the levels of a DFD?
DFDs have three common levels. Level 0, the context diagram, shows the whole system as one process with its external entities. Level 1 breaks that into four to seven main subprocesses and adds data stores. Level 2 explodes one level 1 process into detailed sub-processes.
When should you use a data flow diagram?
Use a DFD during requirements gathering, system design, and security reviews, when you need to show where data enters, gets processed, gets stored, and exits. It works best for communicating a system's data side to both technical and non-technical readers without exposing implementation detail.
