Incremental Refresh in Dataflows Gen2 (2026 Edition)
Incremental Refresh (IR) in Power BI Dataflows Gen2 is a powerful, often misunderstood capability—especially now that Microsoft Fabric positions OneLake as the unified storage layer for ingestion, transformation, and analytics.
This post explains how Dataflow Incremental Refresh works, how it differs from dataset IR, when to use it, and how to design it safely for enterprise‑grade workloads.
🔷 1. What Dataflow Incremental Refresh Is (and Isn’t)
Dataflow IR performs incremental ingestion at the ETL layer. It controls what data is extracted from source systems and loaded into OneLake, not how semantic models partition their imported data.
Key characteristics:
✔ Dataflow IR manages data ingestion, not semantic model partitions
Dataflows apply Store/Refresh windows during data extraction and materialization. Unlike dataset IR, they do not create semantic model partitions.
1✔ Query folding is mandatory
If folding breaks, IR is ignored and the entire dataflow refreshes fully. Microsoft emphasizes folding behavior for incremental logic.
1✔ Dataflows IR shapes what lands in OneLake
Semantic models built on top may apply their own Incremental Refresh, giving you a two‑tier IR architecture (best practice).
🔷 2. Dataflow IR Architecture
The following diagram shows how Dataflow IR fits into Fabric’s ingestion pipeline:
📊 Diagram 1 – Dataflow Incremental Refresh Architecture
Flow description:
- Source systems provide raw transactional or telemetry data.
- Dataflow Gen2 applies Incremental Refresh (Store/Refresh windows).
- Data lands in OneLake as delta/parquet after IR processing.
- A Semantic Model can optionally add a second layer of Incremental Refresh.
🔷 3. How Dataflow Incremental Refresh Works
A. Store Period
Defines how much history to retain in the Dataflow output.
B. Refresh Period
Defines the window of recent data to re‑ingest (e.g., last 3 days).
C. Rolling window behavior
Dataflow IR:
- Pulls only the Refresh period from the source
- Overwrites the Refresh region
- Drops data older than Store period
This differs from semantic models, which create and preserve partitions.
📊 Diagram 2 — Store & Refresh Windows in a Dataflow
Blue = historical data retained in OneLake
Green = incremental window re‑processed each refresh
🔷 4. Dataflow IR vs Dataset IR — What’s the Difference?
| Feature | Dataflow IR | Dataset IR |
|---|---|---|
| Layer | Ingestion (ETL) | Semantic model (BI) |
| Output | Refreshed OneLake tables | Partitioned model (Premium) |
| Partitioning | None | Yes (physical partitions) |
| Folding requirement | Mandatory | Strongly recommended |
| Performance impact | Reduces load on source systems | Reduces model processing time |
| Best used for | Heavy ETL / Delta loads | Report‑optimized partition models |
| Works with Direct Lake? | Yes | Often not needed |
Why both layers matter
Semantic model IR still processes data into VertiPaq partitions—even if Dataflow IR already filtered history.
This two‑layer design is recommended for Fabric. (Supported by Fabric’s ingestion/storage model described in the overview)
🔷 5. When Should You Use Dataflow IR?
Use Dataflow Incremental Refresh when:
- Your source system cannot handle full extract queries
- You want to pre‑stage cleaned, historical tables in OneLake
- You need ETL‑level optimization before semantic model processing
- You want smaller semantic models (less memory pressure)
🔷 6. When NOT to Use It
Avoid Dataflow IR when:
- Your queries don’t fold (full refresh every time)
- You only have small datasets (no benefit)
- You rely heavily on Direct Lake—semantic model IR becomes less relevant, and Dataflow IR may add redundancy
🔷 7. Best‑Practice Patterns for Fabric (2026)
⭐ Pattern 1 — Dataflow IR + Semantic Model IR (recommended)
- Keeps source extraction light
- Keeps semantic model refresh fast
- Doubles robustness if one layer temporarily fails
⭐ Pattern 2 — Dataflow IR feeding Direct Lake
When Direct Lake is used as primary storage:
- Dataflow IR keeps your staging area compact
- Direct Lake auto‑syncs changes to semantic model via “Keep your Direct Lake data up to date”3
⭐ Pattern 3 — Dataflow IR + Warehouse Merge
Use dataflows IR to ingest incremental data into a Lakehouse or Warehouse where transformations (MERGE, UPSERT) occur.
🔷 8. Common Failure Scenarios & Fixes
| Failure | Root Cause | Fix |
|---|---|---|
| Full refresh unexpectedly triggers | Query folding broken | Re‑write Power Query steps to fold |
| Missing historical data | Store window too short | Extend store period |
| Source locking | IR refresh window too large | Reduce Refresh period |
| Schema drift | Source changed | Add schema validation steps |
🔷 9. Operational Checklist
- Confirm Power Query folding (View → Query diagnostics).
- Ensure Date/ModifiedDate columns exist and are reliable.
- Set Store period to business‑value retention.
- Set Refresh period to source update frequency.
- Monitor Dataflow runs in Fabric monitoring.
- Consider pairing with dataset IR for large‑scale models.
Comments
Post a Comment