Introduction — Why Real-Time + Large Models Matter

In asset-heavy operations — water utilities, infrastructure networks, transport assets, energy grids, environmental monitoring — data arrives continuously. Meters tick. Sensors report states. Work orders get created. Asset failures occur unexpectedly.

Traditional BI pipelines refresh once a day — but operations don’t wait 24 hours.
This is why modern data platforms increasingly rely on:

Incremental refresh
Change Data Capture (CDC)
Event streams
Lakehouse pipelines and streaming ingestion
Direct Lake + semantic models

Microsoft Fabric merges all these capabilities into one unified architecture, where real-time data engineering and enterprise semantic models work together seamlessly.

In this blog, we explore how these components interact, what technical choices matter, and how platform engineers actually implement them in production.

1. The Challenge — Keeping Large Semantic Models Fresh Without Breaking Them

A large semantic model may contain:

Billions of transactional or sensor rows
Years of asset history
High-velocity telemetry
Multiple domains (assets, operations, licensing, financials)
Hundreds of users
Dozens of report pages
Real-time dashboards and alerts

A full refresh of such a model is usually:

❌ Too slow
❌ Too expensive
❌ Too disruptive
❌ Too fragile (schema drift will break everything)

Incremental + real-time ingestion solves this.

2. Incremental Refresh — The Backbone for Large Data Models

Incremental refresh is the first step toward scaling operational analytics. It works by refreshing only:

Recent data (e.g., last 3 days/week/month)
Data within a dynamic partition window
Optionally detecting changes using “detect data changes” columns

How it works technically

When you configure incremental refresh in Power BI / Fabric:

Power Query defines two parameters:
- RangeStart
- RangeEnd
Data is partitioned (usually per day/month depending on volume).
The engine automatically:
- Rebuilds only partitions inside the “refresh window”
- Loads new rows
- Updates existing rows (if CDC style logic is enabled)
- Retains historical partitions unchanged

Why it matters for engineers

✔ Massive reduction in refresh time
✔ Better memory usage (important in Fabric capacities)
✔ More reliable
✔ Allows near real-time models
✔ Enables hybrid approaches (e.g. Direct Lake)

Practical tips for engineering teams

Ensure fact tables have a date or datetime column suitable for partitioning
Avoid partitioning on high-granularity timestamps (e.g., seconds) — use date or hour
Use “detect data changes” on a modified timestamp column if possible
Monitor refresh using XMLA DMVs
Split very large fact tables into multiple logical partitions/domains if refresh windows become too big
Always test refresh policies in a dev model before pushing to prod

3. Introducing Real-Time: CDC, Event Streams & Streaming Pipelines

Incremental refresh handles historical data efficiently —
real-time ingestion ensures your data is never stale.

This is where Microsoft Fabric becomes very powerful for operational analytics.

3.1 Change Data Capture (CDC)

CDC allows you to capture:

INSERTs
UPDATEs
DELETEs

directly from source systems.

In Fabric, you can trigger CDC through:

Lakehouse pipelines connecting to SQL databases
Azure SQL CDC features
Eventstreams capturing log-based changes
External systems writing Delta Lake–compatible CDC logs

CDC → Lakehouse → Semantic Model Flow

CDC captures new/changed rows from source
Writes changes to the Lakehouse bronze layer
A pipeline merges changes into silver/gold Delta tables
Incremental refresh picks up those deltas
Semantic model reflects the changes almost instantly

Best practices

If the source system supports CDC — always use it rather than reloading data
Ensure Delta Lake merge logic handles duplicates, late-arriving data, and deletes
Use surrogate keys in dimensions to handle type 2 SCD scenarios
Include appropriate metadata: _commit_timestamp, _operation, _lsn

3.2 Event Streams (Real-Time Fabric)

Event Streams in Fabric provide a low-latency stream ingestion service for:

Telemetry (IoT sensors, SCADA)
Asset health metrics
Environmental monitoring
Vehicle pings
Maintenance event triggers
Alerts or notifications

Technically, Event Streams support multiple targets:

KQL database
Lakehouse Delta tables
Power BI streaming datasets
Event hubs / service buses
Notebooks or transformations

Use Case Example — Water Utility Telemetry

Incoming sensor data (e.g., flow, pressure, turbidity) is captured every few seconds:

Device → Event Hub → Event Stream
Event Stream → Delta Lake table
Delta Lake → Direct Lake semantic model
Dashboard updates with second-level latency

Best practices

Partition streaming Delta tables by date or hour
Enforce schema with contracts — streaming sources often drift
Use KQL DB for ultra-fast queries, then materialize to lakehouse
Avoid writing too many small files (use stream compaction in pipelines)

3.3 Pipelines as the Orchestrator Layer

Pipelines in Fabric do what ADF did + more:

Orchestrate batch + streaming + CDC refreshes
Execute merge operations into gold Delta tables
Trigger semantic model refreshes
Apply validation rules
Load dimensions using SCD logic
Send alerts if job failures occur

4. When Real-Time + Incremental Meets Semantic Models

This is the most important architectural interaction.

A semantic model can now draw on:

1. Historical fact tables (imported or Direct Lake)

Managed via incremental refresh
Efficient and stable

2. Near-real-time Delta tables (Direct Lake)

Updated via CDC, event streams, or pipelines
No need to “refresh dataset”; data updates instantly
Cached intelligently by the engine

3. Real-time dashboards (direct stream)

For telemetry, alerts, dashboards that require <5-sec latency

Together, this forms a hybrid semantic model:

Layer	Purpose
Import	Stable, aggregated history
Direct Lake	Near real-time operational data
Streaming visuals	Instant telemetry/alerts

Why this matters technically

Import tables do not need constant refresh
Direct Lake avoids full ingestion into VertiPaq
File changes in Delta Lake immediately reflect in semantic model
Refresh windows stay extremely small
Capacity use is dramatically reduced
DAX measures can reference both historical + real-time simultaneously

5. Things That Fail — And How Engineers Fix Them

When working with real-time + incremental + semantic models, here are the failure points:

5.1 Schema drift

Streaming systems often add fields unexpectedly.
➡ Fix using schema validation pipelines
➡ Convert unsupported types in silver layer

5.2 Small file problem

Streaming ingestion → thousands of tiny Parquet files
➡ Use Fabric’s file compaction pipelines
➡ Optimise delta tables (OPTIMIZE in notebooks)

5.3 Late-arriving CDC updates

Sometimes updates arrive globally out of order
➡ Use merge logic sorted by _commit_timestamp
➡ Deduplicate using row hash keys

5.4 Semantic model refresh failures

Likely caused by:

Partition too large
Memory pressure
Cardinality spikes

➡ Reduce incremental window
➡ Create smaller partitions
➡ Use Date-Hour partitioning
➡ Move hot data to Direct Lake instead of Import

5.5 Direct Lake fallback to Dual/Import

If Delta tables are mis-optimised:

Too many small files
Unsupported data types
Missing VORDER / Z-Order

➡ Refactor pipeline to enforce performant Delta Lake tables

6. Practical Guidance — For Fabric Engineers

For Fact Tables

Partition by date
Store in Delta format
Compact files daily/hourly
Keep history in Import mode
Use Direct Lake for hot data window

For Dimension Tables

Use SCD2 logic for asset lifecycle
Use surrogate keys
Version dimensions monthly
Apply OLS/RLS at dimension level

For Pipelines

Build simple reusable components (CDC merge, compaction, SCD load)
Trigger semantic model refresh only when gold tables update
Use alerts and failure pipelines

For Semantic Models

Use hybrid import + direct lake
Use incremental refresh for historical partitions
Turn on “Large semantic model” format
Avoid snowflake schemas, use star
Use Tabular Editor for governance and CI/CD

7. Conclusion — The New Standard for Operational Analytics

Microsoft Fabric + Power BI semantic models deliver something no past Microsoft stack could:

▶ Batch + Streaming + CDC + Real-Time + Lakehouse + BI in one integrated ecosystem.

For asset-heavy operations, this architecture enables:

Near-real-time dashboards
Operational alerts
24/7 monitoring
Up-to-date KPIs
Scalable history analytics
Robust, governed semantic models
Lower cost and higher reliability

This is the future of enterprise analytics — not nightly refreshes, but continuously updated semantic models feeding the entire organisation.

From Batch to Real-Time: How Incremental Refresh, CDC & Event Streams Power Enterprise Semantic Models in Microsoft Fabric

1. The Challenge — Keeping Large Semantic Models Fresh Without Breaking Them

2. Incremental Refresh — The Backbone for Large Data Models

How it works technically

Why it matters for engineers

Practical tips for engineering teams

3. Introducing Real-Time: CDC, Event Streams & Streaming Pipelines

3.1 Change Data Capture (CDC)

In Fabric, you can trigger CDC through:

CDC → Lakehouse → Semantic Model Flow

Best practices

3.2 Event Streams (Real-Time Fabric)

Technically, Event Streams support multiple targets:

Use Case Example — Water Utility Telemetry

Best practices

3.3 Pipelines as the Orchestrator Layer

4. When Real-Time + Incremental Meets Semantic Models

1. Historical fact tables (imported or Direct Lake)

2. Near-real-time Delta tables (Direct Lake)

3. Real-time dashboards (direct stream)

Why this matters technically

5. Things That Fail — And How Engineers Fix Them

5.1 Schema drift

5.2 Small file problem

5.3 Late-arriving CDC updates

5.4 Semantic model refresh failures

5.5 Direct Lake fallback to Dual/Import

6. Practical Guidance — For Fabric Engineers

For Fact Tables

For Dimension Tables

For Pipelines

For Semantic Models

7. Conclusion — The New Standard for Operational Analytics

Comments

Post a Comment

Popular posts from this blog

Refresh Settings & Failure Management for Large Semantic Models in Microsoft Fabric (2026 Edition)

Migrating from Power BI Premium to Microsoft Fabric: A Step-by-Step Guide

Incremental Refresh in Dataflows Gen2 (2026 Edition)