Data Warehousing in Fabric

davidcenna · September 10, 2025, 2:48pm

Hi everyone,

At my organization, we’re currently facing challenges with scattered data across multiple departments and systems, each team manages its own tools (ERP, logistics, operations, etc.), and reporting often relies on manual exports. This makes it difficult to have a single source of truth and slows down decision-making.

To address this, I’m leading a proof of concept (6-month trial) to unify our enterprise data using Microsoft Fabric.

We’ve already connected and validated key source systems (ERP, fleet tracking, production data, etc.), and I’m now focused on designing the Fabric workspace to avoid replicating the same silos we’re trying to eliminate.

Here’s the architecture I’m considering:

A single central workspace
One lakehouse that houses all data layers
Organized structure within the lakehouse for Bronze (raw), Silver (cleaned), and Gold (curated) data
Access managed via Fabric permissions instead of creating multiple lakehouses/workspaces

Key goals:

Ensure collaboration across teams with consistent, trusted data
Simplify governance and avoid fragmentation
Enable business users to build reports off the Gold layer without duplicating effort

I’d appreciate advice on:

Does this centralized lakehouse model follow best practices in Fabric?
Are there risks or limitations I should watch out for?
Would anyone recommend splitting layers into separate lakehouses or workspaces in certain scenarios?

If anyone has tackled something similar, especially in multi-department setups, I’d love to hear how you approached it.

Thanks so much in advance!

pranamg · September 15, 2025, 5:30am

Hey David—great progress so far. You’ve already done the hardest part: getting the key systems connected and validated.
Very short version of the answer: Refer to this book “Deciphering Data Architectures” by James Serra for answers. Start by watching the recording of this interesting session on “Using Generative AI on Structured Data (to query and modify data)“ and at the end of this, James introduces his book. You can also refer to a series of articles/videos covering these topics by Reza Rad on radacad.com. https://radacad.com/power-bi-shared-datasets-what-is-it-how-does-it-work-and-why-should-you-care/

Very long version (generic and might not fully apply to your scenario as it is work-culture, data maturity and industry dependent).
Below is a pragmatic view on your proposed design, what to watch out for, and where a bit of (lightweight) federated structure will save you pain later.

TL;DR (executive summary)

Your approach is directionally right. A medallion (Bronze/Silver/Gold) lakehouse on Microsoft Fabric is the recommended pattern to establish a trustworthy single source of truth. [1]
Tweak the layout for scale/governance. Keep the medallion pattern, but don’t put everything in a single lakehouse and one workspace. Microsoft’s guidance recommends separate lakehouses per zone and (often) separate workspaces for better control, lifecycle management, and blast‑radius isolation—especially beyond a POC. [1]
Use OneLake Shortcuts rather than copying data across teams/domains to avoid duplication and keep permissions consistent. [2]
Publish Gold via Direct Lake semantic models (Power BI) to empower business users—with table optimization (V‑Order, compaction) to sustain performance. [3][4][5]
Governance: Organize by Domains (data‑mesh style), apply sensitivity labels and use Microsoft Purview for cataloging, lineage, and policy—so you get trusted, consistent data without creating new silos. [6][7][8][9]

1) Does a centralized lakehouse model align with Fabric best practices?

Conceptually yes (medallion & OneLake), but adjust the physical layout.

Medallion is the recommended pattern in Fabric to incrementally improve data quality (Bronze → Silver → Gold) and build a single source of truth. [1]
OneLake’s goal is “one logical lake” with one copy of data; use Shortcuts to reference data across workspaces/capacities/clouds without copying. This is key to breaking silos while avoiding edge copies. [2]
Physical design guidance: Microsoft recommends separate lakehouses per zone and states that while you can place all lakehouses in a single workspace, using separate workspaces gives you more control and governance at the zone level. This is the main tweak I’d make to your “one lakehouse & one workspace” plan. [1]
Organizational structure: Use Domains to reflect business areas (ERP/Finance, Logistics, Operations, etc.) and delegate certain governance settings down from tenant to domain admins (federated governance). [6][7]
For analytics consumption: Build Direct Lake semantic models over Gold tables for high‑perf self‑service BI without scheduled data imports. It’s explicitly called out as ideal for medallion Gold. [3]

What this looks like in practice for your POC

Workspaces:
- Core-Data (Bronze) – ingestion/landing (often via mirroring/ingest pipelines).
- Core-Data (Silver) – standardized/conformed.
- Domain X (Gold) – curated star schemas + semantic models for each domain (Finance, Logistics, Ops).
  This gives you clean environment boundaries but still one logical lake via OneLake and shortcuts. [1][2]

2) Risks & limitations to be aware of

Single workspace/lakehouse = governance & blast radius risk

Workspace roles are coarse; putting everyone in one workspace increases permission sprawl and makes least‑privilege hard (e.g., a Contributor can modify all items in that workspace). Item‑level permissions exist, but they’re easier to reason about when items are scoped per zone/domain. [10][11][12]

Lifecycle management & CI/CD realities

Deployment pipelines & Git: Lakehouse containers deploy, but tables/files don’t automatically copy; guidance is to rebuild content in target via pipelines/notebooks after deploy (current platform behavior). Plan for this in Dev→Test→Prod. [13][14]

Performance of Direct Lake depends on Delta health

To keep Gold fast and interactive: V‑Order, file compaction, and appropriate file sizes (avoid too many small files). These materially affect transcoding and query latency. [4]
Maintain Delta tables with Optimize + Vacuum; be mindful that shorter vacuum retention reduces time‑travel and must be explicitly configured. [5]

Security model layering

Row‑level security (RLS) & Object‑level security (OLS) are applied at the semantic model; for Direct Lake, it’s recommended to use a fixed identity (sharable cloud connection) for predictable security behavior. [15][16]
Viewers can read via SQL analytics endpoint only if SQL access policy is granted; another reason to keep Gold distinct and governed. [12]

Governance & compliance coverage

Sensitivity labels apply to Fabric items and integrate with Purview; DLP is currently focused on semantic models—so treat the semantic model as the governed, certified entry point for business users. [9][8]

Capacity management

With many teams on one capacity, watch for throttling and contention. Install the Fabric Capacity Metrics app to monitor utilization and schedule heavy jobs off‑peak. [17]

Real‑time vs batch

If fleet tracking requires streaming, Eventstreams can ingest/transform/route events no‑code; consider isolating real‑time workloads from batch processing for reliability. [18]

Source system sync

For operational sources (ERP, logistics DBs), Mirroring into OneLake provides low‑latency replication (or metadata mirroring + shortcuts) without complex CDC pipelines—helpful during a POC to reduce engineering overhead. [19]

3) When are separate lakehouses/workspaces recommended?

Per medallion zone: Bronze, Silver, and Gold in distinct lakehouses (and often workspaces) to isolate permissions, jobs, data quality checks, and SLAs. [1]
By domain (data‑mesh): Finance, Logistics, Operations, etc. as Domains with their own workspaces; central team provides shared standards and platform. Delegated governance at the domain level. [6][7]
Environment separation: Dev / Test / Prod workspaces with deployment pipelines; remember to script/automate recreating lakehouse content in target stages. [13][14]
Capacity isolation or regulated data: Sensitive/regulated datasets (PII/PHI) placed in dedicated workspaces/capacities with stricter policies; share curated Gold via Shortcuts or semantic models to consumers. [10][2]
Streaming vs batch: Keep real‑time event processing separate from batch data engineering to decouple SLAs and failure domains. [18]

4) Suggested reference architecture for your 6‑month POC

A. Topology

Core (IT-run) workspaces
- Core-Data-Bronze (Ingest/Mirroring/Landing)
- Core-Data-Silver (Standardization, conformance, keys, SCD)
- Rationale: recommended separation by zone for control and governance. [1]
Domain (business-aligned) workspaces
- Finance-Gold, Logistics-Gold, Operations-Gold → curated star schemas + Direct Lake semantic models; expose Certified datasets for self‑service. [3]
Cross‑workspace data sharing
- Use OneLake Shortcuts from domain Gold to Silver (or vice‑versa) so you don’t duplicate data while keeping permissions clean. [2]
Governance
- Create Domains (Finance, Logistics, Ops) and assign the corresponding workspaces; apply sensitivity labels, enable Purview catalog/lineage. [6][7][8][9]

B. Data flows

Bronze: land via Mirroring (for ERP/operational DBs) or pipelines; append‑only, immutable. [19]
Silver: standardize to Delta with business keys, data quality rules; OPTIMIZE + V‑Order on larger tables. [5]
Gold: dimensional models for reporting; Direct Lake semantic models with RLS/OLS as needed; prefer a fixed identity connection for Direct Lake models. [3][15][16]

C. CI/CD & environments

Dev/Test/Prod deployment pipelines; after deploy, run orchestrated notebooks/pipelines to create target tables/shortcuts (lakehouse content doesn’t auto‑move). [13][14]

D. Monitoring & operations

Capacity Metrics app + scheduling to avoid peak contention; implement regular table maintenance (Optimize, V‑Order, Vacuum). [17][5]

5) Guardrails & patterns that work well

Data contracts between zones:
- Bronze = immutable raw (or mirrored).
- Silver = conformed tables with clear SCD rules and SLAs.
- Gold = only what business consumes (keep it slim). (General design recommendation)
Performance hygiene (Gold): compact files (~128MB–1GB), minimize small files, apply V‑Order, and avoid frequent small updates that bloat the Delta log. [5][4]
Security layering: workspace + item permissions + RLS/OLS at the semantic model; sensitivity labels on items. [10][16][9]
Self‑service enablement: publish Certified Direct Lake semantic models per domain; use Domains & the OneLake Catalog for discoverability. [3][6]

6) Example 6‑month POC plan (simple, outcome‑oriented)

Month 1–2 – Foundation & ingestion

Stand up Core-Data-Bronze, Core-Data-Silver, and one domain Gold workspace (e.g., Logistics).
Ingest ERP/logistics/production via Mirroring/pipelines to Bronze; define data contracts & naming conventions. [19]

Month 3 – Silver standardization & quality

Build conformed Silver tables (keys, SCD, dedup, standard units).
Add DQ checks; implement table maintenance jobs (Optimize/V‑Order/Vacuum). [5]

Month 4 – Gold & BI enablement

Model first Direct Lake semantic model for Logistics; apply RLS/OLS; label items; publish a Certified dataset. [3][15][16][9]

Month 5 – CI/CD & scale‑out

Set up deployment pipelines (Dev/Test/Prod), automate post‑deploy build of lakehouse content; onboard a second domain (e.g., Operations). [13][14]

Month 6 – Optimization & rollout

Performance tuning (file sizes, partitions), capacity monitoring, schedule maintenance, and formalize a domain‑by‑domain rollout plan. [5][17]

7) “Lessons learned” patterns from multi‑department deployments

Central platform, federated domains works best: the core team runs Bronze/Silver and standards; domains own Gold and semantic models. This keeps autonomy where it matters while preserving a single source of truth. (Architecture pattern aligned with Domains guidance) [6][7]
Avoid “everything in one workspace.” It looks simpler early on but complicates permissions, CI/CD, and incident blast radius as more teams join. Use workspaces per zone/domain and glue it together with Shortcuts. [1][2]
Treat Direct Lake performance as an engineering discipline. Consistent V‑Order & compaction routines + thoughtful update patterns make the difference between import‑like performance and “mysterious slowness.” [4][5]
Plan for deployments now. Understand that lakehouse content doesn’t auto‑copy in pipelines; orchestrate builds in target environments after deployment. [13][14]
Make the Gold layer the governed doorway for business: certified datasets, sensitivity labels, and usage via the app—keep exploration “close to the model,” not to raw files. [9]

Quick checklist you can use this week

Split the POC into Core-Data-Bronze, Core-Data-Silver, and one Domain‑Gold workspace; wire with Shortcuts. [1][2]
Stand up Direct Lake model on one key Gold subject area; enable RLS with a fixed identity connection. [3][15]
Schedule Optimize/V‑Order/Vacuum for the top 5 largest tables. [5]
Enable Purview integration and sensitivity labels; certify the first domain dataset. [8][9]
Install the Capacity Metrics app and baseline consumption before more teams onboard. [17]

A couple of clarifiers to tailor this for you

Rough daily volumes by source (ERP/logistics/production), and what fraction needs near‑real‑time vs hourly/daily?
How many domains you plan to onboard during the POC versus post‑POC?
Any regulated data (PII/financial) that warrants workspace/capacity isolation from day one?
Current Fabric capacity size (F‑SKU) and typical Power BI concurrency?

If you share these with Data Mentor, it can probably turn the above into a concrete blueprint (naming conventions, folder/table layout, RLS design, deployment steps, and a minimal Fabric governance playbook) tailored to your org.

Hope this helps!

References

[1] Implement medallion lakehouse architecture in Fabric - Microsoft Fabric

[2] Unify data sources with OneLake shortcuts - Microsoft Fabric

[3] Direct Lake overview - Microsoft Fabric | Microsoft Learn

[4] Understand Direct Lake query performance - Microsoft Fabric

[5] Delta table maintenance in Microsoft Fabric - Microsoft Fabric

[6] Domains - Microsoft Fabric | Microsoft Learn

[7] Best practices for planning and creating domains in Microsoft Fabric

[8] Use Microsoft Purview to govern Microsoft Fabric

[9] Apply sensitivity labels to Fabric items - Microsoft Fabric

[10] Permission model - Microsoft Fabric | Microsoft Learn

[11] Roles in workspaces in Microsoft Fabric - Microsoft Fabric

[12] Workspace roles and permissions in lakehouse - Microsoft Fabric

[13] Lakehouse deployment pipelines and git integration (Preview)

[14] Solved: Deployment pipeline and lakehouse content - Microsoft Fabric …

[15] Manage Direct Lake semantic models - Microsoft Fabric

[16] Object-Level Security (OLS) with Power BI - Microsoft Fabric

[17] Install the Microsoft Fabric capacity metrics app

[18] Microsoft Fabric event streams overview - Microsoft Fabric

[19] Mirroring - Microsoft Fabric | Microsoft Learn

davidcenna · September 22, 2025, 2:57pm

This is very detailed and comprehensive. Thank you for taking the time to demystify it for me. I am grateful to have received your assistance on this matter.