top of page
Search

Why Your Microsoft Fabric Warehouse is Missing Data (And How to Fix It)

  • Writer: Christian Steinert
    Christian Steinert
  • 6 days ago
  • 7 min read

The hidden metadata refresh delay that cost us hours of troubleshooting—and how one API call solved everything


The 8 AM Mystery: A Data Detective Story


It was 8 AM on Tuesday, and I had just run a stored procedure that builds the historical financial snapshot ledger for one of our healthcare clients. I crafted a basic SELECT statement and queried the table to check data for 10/7/2025.


Two of our key metrics returned 0.


Not low. Not slightly off. Zero.


My stomach dropped a bit. These metrics are critical for daily executive reporting, and leadership relies on them to make operational decisions. I immediately went into troubleshooting mode.


Could it be the source system? I queried the source directly—data returned perfectly.


Could it be our end tables? I checked our Microsoft Fabric Warehouse table—nothing. Empty results.


Was it a timing issue? Maybe the data hadn’t fully landed yet? I sat there for two minutes, puzzled and scratching my head, watching the clock tick and hoping rows would magically appear.


Then an idea dawned on me. Let’s check the raw lakehouse table.


I queried our bronze layer in the Fabric Lakehouse...and sure enough, complete data. Every single row, sitting there perfectly intact.


Oh boy. Now we had a real mystery.


Finding the Fault Line

I systematically moved through our data architecture, querying each layer. When I hit our middle staging table and ran some basic count analysis, I found it: data was missing. Boom—I’d identified where the fall-off began.


For context, Steinert Analytics operates Microsoft Fabric aligned to a medallion architecture, which I’ve mentioned in previous issues:


  • Bronze Layer (Lakehouse): Raw source tables land here with minimal transformation

  • Silver Layer (Warehouse): Light transformation and cleanup—our validated staging area

  • Gold Layer (Warehouse): Curated tables for reporting that end users interact with

ree

Something was breaking between our bronze layer in the Lakehouse and our silver/gold layers in the Warehouse. And it wasn’t just a few rows. For our larger tables—those with over a million records—we were seeing a fall-off of hundreds to thousands of rows.


For a healthcare analytics consultancy where accuracy isn’t just important but potentially impacts patient care decisions and financial reporting, this was unacceptable.


Community to the Rescue

Similar to the failed Copy Job pipeline story I shared a few weeks ago, I turned to the r/MicrosoftFabric subreddit community. And once again, they came through.


ree

What I wasn’t aware of—and what Microsoft doesn’t make immediately obvious in their documentation—is that there’s a metadata refresh delay between the Fabric Lakehouse and Fabric Warehouse when loading data.


This delay was causing hundreds to thousands of rows to fall off in our Warehouse silver and gold tables. Not because the data wasn’t there, but because the Warehouse couldn’t “see” it yet.


Obviously a HUGE ISSUE. It meant our gold tables would be delayed in updates, giving leadership untimely or incomplete information on a daily basis. In healthcare, where daily census numbers, financial metrics, and operational KPIs drive real-time decisions, this kind of data lag isn’t just inconvenient—it’s potentially dangerous.


The fix? Implement a forced metadata refresh before loading Lakehouse bronze tables into the Warehouse silver and gold layers.


Let me walk you through the technical details of what’s happening and how we solved it.


The Technical Deep Dive: Understanding Metadata Refresh Delays in Microsoft Fabric

When working with Microsoft Fabric’s lakehouse-to-warehouse architecture, there’s a critical but often overlooked aspect of how data synchronization works: the SQL Analytics Endpoint metadata refresh mechanism.


What’s Actually Happening Behind the Scenes

In Microsoft Fabric, when you load data into a lakehouse, you’re writing files to Delta tables in OneLake. However, the SQL Analytics Endpoint—which provides the warehouse-like querying capability—doesn’t immediately “see” these new files.


According to Microsoft’s documentation, the SQL Analytics Endpoint maintains its own metadata layer that represents the structure and content of your lakehouse tables. Think of it like an index in a book: the content exists, but if the index isn’t updated, you can’t find what you’re looking for.


This metadata layer needs to be explicitly refreshed to recognize new data files that have been added to your Delta tables. Without this refresh, your warehouse queries are essentially looking at a stale snapshot of your data—even though the actual data exists in the lakehouse.


For tables with millions of rows where hundreds or thousands of new records are being added daily, this metadata lag creates a significant data integrity issue. Your bronze layer has complete data, but your silver and gold layers are querying against outdated metadata that doesn’t include the latest file additions.


Why Microsoft Designed It This Way

The metadata refresh mechanism exists for performance optimization. According to Microsoft’s documentation on SQL Analytics Endpoint performance, continuously scanning for new files in large datasets would create significant overhead on the system.


Instead, Fabric uses a pull-based refresh model where metadata updates are triggered either automatically on a schedule or manually via API call. This design trade-off prioritizes query performance over real-time data visibility.


For many analytics workloads—like monthly reporting or historical analysis—this makes perfect sense. But for operational reporting with daily data pipelines where timeliness matters, it creates the exact problem we encountered.


The Solution: Forced Metadata Refresh via Web Activity

The fix involves explicitly calling the Fabric REST API to trigger a metadata refresh before your stored procedures run. This ensures that the SQL Analytics Endpoint has the most current view of your lakehouse data before any transformation or loading occurs.


ree

Here’s how we implemented it in our Data Factory pipelines (shoutout to kmritch on Reddit):


Step 1: Add a Web Activity 

Insert a Web Activity into your pipeline before any warehouse operations. This becomes your “metadata refresh checkpoint.”


Step 2: Configure Connection Settings


Step 3: Configure Web Activity Parameters

  • Relative URL: v1/workspaces/{WORKSPACEID}/sqlEndpoints/{SQLANALYTICSENDPOINTID}/refreshMetadata

  • Method: POST

  • Body: {”timeout”:{”timeUnit”:”Seconds”,”value”:”60”}}

  • Headers:

    • Name: Content-type

    • Value: application/json


The timeout parameter is important—it tells Fabric how long to wait for the metadata refresh to complete. For our larger tables, 60 seconds proved sufficient, but you may need to adjust based on your data volumes. In testing, we found that tables with 5+ million rows occasionally needed 90 seconds during heavy load periods.


Implementation Considerations

After implementing this solution across all our pipelines, we immediately saw 100% data completeness in our silver and gold tables. However, there are a few considerations worth noting:


Pipeline Duration: Adding the metadata refresh step adds 30-60 seconds to each pipeline run. For our daily batch processes, this is completely acceptable. For real-time or high-frequency pipelines, you’ll need to weigh the latency cost against data completeness requirements.


API Rate Limits: Microsoft Fabric uses throttling to maintain optimal performance and reliability, limiting the number of API calls within a time window per user. When you exceed these limits, Fabric returns an HTTP status code 429 (Too many requests) with a Retry-After header indicating how long to wait before making a new request. If you’re running many concurrent pipelines, you may need to implement retry logic or stagger your pipeline executions to avoid hitting these limits. While Microsoft doesn’t publish specific rate limits for the metadata refresh endpoint, monitoring for 429 responses and implementing appropriate retry logic is a best practice.


Monitoring: I recommend adding logging around the metadata refresh call to track when it completes and capture any failures. This is an area I’m still developing in our infrastructure—we haven’t fully implemented centralized monitoring for these API calls yet. However, given the criticality of data completeness for our healthcare clients, building out proper observability around these metadata refresh operations is next on my roadmap. At minimum, consider adding error handling in your Web Activity that logs failures so you can investigate when pipelines don’t behave as expected.


Cost Implications: While the API call itself is free, the additional pipeline runtime does add minor compute costs. For us, the cost increase was negligible compared to the value of data accuracy.


The Bigger Picture: Lessons for Data Architecture

This issue highlights a fundamental principle in modern data architectures: understanding the consistency model of your platform.


Microsoft Fabric’s lakehouse and warehouse components aren’t operating under strong consistency—there’s an eventual consistency model with the SQL Analytics Endpoint metadata. If you assume immediate consistency (like you might with a traditional database), you’ll run into exactly the problem we experienced.


Key Takeaways for Healthcare Data Consultants

1. Platform Knowledge is Non-Negotiable You can’t just assume cloud platforms work like on-premises systems. Each has its own quirks, consistency models, and optimization trade-offs. Reading documentation thoroughly—even the boring API reference docs—pays dividends.


2. Test at Scale This issue only manifested with our larger tables. In development with smaller datasets, everything looked fine. Always test your pipelines with production-scale data volumes before go-live.


3. Community is Your Secret Weapon The r/MicrosoftFabric community diagnosed this issue in hours. Building relationships with other practitioners in your technology stack accelerates problem-solving exponentially.


4. Implement Defensive Data Engineering The metadata refresh API call should now be a standard component of any Fabric pipeline that moves data from lakehouse to warehouse. It’s not optional if you need data completeness guarantees—it’s a best practice.


5. Build Monitoring Into Your Roadmap

We don’t yet have automated row count reconciliation checks between our bronze, silver, and gold layers—but this issue made it clear we need them. Manual spot-checking caught this problem, but that’s not scalable or reliable. Building automated data quality checks that alert us when counts don’t match within expected tolerances is a priority for our next phase of infrastructure development. Don’t wait for users to find data issues—invest in monitoring before problems compound.


Where We Are Now

The good news? Once you know about this quirk, the fix is straightforward and reliable. Our pipelines have been running flawlessly since implementation, and we’re confident that our leadership team is getting timely, complete data every single day. You can copy and reuse that same Web activity in all your Fabric Data Factory flows now, too.


More importantly, we’ve documented this solution internally and created a pipeline template that includes the metadata refresh step by default. New projects don’t have to rediscover this issue—we’ve baked the solution into our standard practices.


That’s the hallmark of mature data engineering: learning from issues, documenting solutions, and systematically eliminating entire classes of problems from your future work.


Your Turn

Are you running Microsoft Fabric in production? Have you encountered this metadata refresh issue? I’d love to hear about other Fabric quirks you’ve discovered and how you’ve solved them.


And if you’re experiencing unexplained data completeness issues in your Fabric warehouse, now you know where to look first.


See you next week.


Quick Links:


Christian Steinert is the founder of Steinert Analytics, helping healthcare & roofing organizations turn data into actionable insights. Subscribe to Rooftop Insights for weekly perspectives on analytics and business intelligence in these industries.


Feel free to book a call with us here or reach out to Christian on LinkedIn. Thank you!


Also - check out our free Healthcare Analytics Playbook email course here.

 
 
 

Comments


bottom of page