Our Tips & Observations Actually Doing Healthcare Data Modeling - Part 2

Christian Steinert
Jul 15
4 min read

The path to showing value quickly in healthcare data projects

The chaos of data pipelining cannot hold you back in an end to end data modernization.

Healthcare companies are dependent upon showing value quickly, and blowing through compute credits and endless amounts of time just to move data is a surefire way to derail tangible project value.

In particular, we were up against this challenge with one of our healthcare clients. As we battled the tricky Microsoft ecosystem of bugs and inflexible data pipelines, we had to make movement on the modeling side.

How?

Similar to part 1, this post is less on the granular details of data modeling insights when physically coding/building. Think of this series as your guide to effective project management to enable the best possible data modeling in a healthcare data project.

Our intent is to give you insights from field experience battling in the trenches of healthcare data warfare.

1. Make Movement Like an Analyst

My data mentor has always clung to hiring data analysts first when looking to build out a team. Why? Analysts are quick and scrappy. They don’t get caught up in the complexity of data pipelines and fancy tooling. Their advantage is an unwavering focus on understanding a company’s data enough to derive insights that positively impact decisions. Those decisions lead to a more efficient and profitable business.

Here’s why this is critical in a data project. Sometimes, pipelines get blocked. It’s easy to get in a rabbit hole of troubleshooting code. This takes time away from truly building value off a healthcare company’s data.

Something we did when a client project hit a Microsoft data pipeline roadblock is force movement on the data modeling side of the project. While the engineers troubleshot Microsoft Fabric pipeline bugs, the analysts of the team spent time assembling the staging, dimension and fact tables.

Dive into the source data, and start writing the code in an IDE editor respective of the SQL dialect you’re using for the data warehouse. Just because you can’t execute a query yet does NOT hold you back from understanding the data and modeling effectively.

2. Use a Shared Document For All Your SQL Transformations

As we’ve gone through projects, it’s a good practice to consolidate all transformation queries for silver (staging), and gold layers.

By leveraging Google Docs, we are able to collaboratively blast through the 20 plus transformation queries that typically make up a schema in the data warehouse.

This workflow is especially ideal when you’re stuck on prior dependencies but need to keep moving on data modeling.

It ensures no redundant work accidentally happens. Plus, it allows us to review each other’s modeled code prior to ever committing or publishing tables in Fabric.

Furthermore, without access to a CI/CD integration, Google Docs enables the real-time version control needed for efficient data projects. Each person creating their own .txt or .sql files and sending them in a Slack thread makes it impossible to track the most recent version.

We typically build in the Fabric IDE or VS Code and then move it to the shared Google Doc.

3. Build Your Data Dictionary as You Model, Not After

Ah, yes. The joys of data documentation. The tedious aspect of data management that is one of the most critical. It helps end users use the data for confident decision making.

It’s better to start on this sooner rather than later. I hate having to trace what we’ve built and stand up a data dictionary after the fact. It’s better to just build and work on this document while we design our data models.

The data dictionary answers the questions:

- What does this field mean?

- Where does this field come from? (source system)

- ie. a director of finance can reference the dictionary to ensure they’re looking at the correct net revenue field for a quarterly report

Typically, a data dictionary contains the core columns below:

Field name
Definition
Source System
API name
Source system logic
Report logic
Load Frequency

Focus on Showing Value

Stay ahead, even if you’re behind. Don’t let the backend development work stall progress. The trick is to show business value quickly, and simultaneously working on the backend while designing the business-facing data logic keeps your ship afloat.

Remember, no one cares about the infrastructure. They care about the accuracy and timeliness of the data to solve their problems. If you kick up your feet on data modeling while debugging data pipelines, all that effort is hidden from execs.

Work on BOTH simultaneously and keep that needle moving.

Place an emphasis on robust documentation so end users can see the attention to quality and detail of the data too.

Once the backend pipelines are in place, you can start truly validating the data and executing the data models against the system to ensure accuracy.

Christian Steinert is the founder of Steinert Analytics, helping healthcare & roofing organizations turn data into actionable insights. Subscribe to Rooftop Insights for weekly perspectives on analytics and business intelligence in these industries.

Feel free to book a call with us here or reach out to Christian on LinkedIn. Thank you!