Hey David—great progress so far. You’ve already done the hardest part: getting the key systems connected and validated.
Very short version of the answer: Refer to this book “Deciphering Data Architectures” by James Serra for answers. Start by watching the recording of this interesting session on “Using Generative AI on Structured Data (to query and modify data)“ and at the end of this, James introduces his book. You can also refer to a series of articles/videos covering these topics by Reza Rad on radacad.com. https://radacad.com/power-bi-shared-datasets-what-is-it-how-does-it-work-and-why-should-you-care/
Very long version (generic and might not fully apply to your scenario as it is work-culture, data maturity and industry dependent).
Below is a pragmatic view on your proposed design, what to watch out for, and where a bit of (lightweight) federated structure will save you pain later.
TL;DR (executive summary)
-
Your approach is directionally right. A medallion (Bronze/Silver/Gold) lakehouse on Microsoft Fabric is the recommended pattern to establish a trustworthy single source of truth. [1]
-
Tweak the layout for scale/governance. Keep the medallion pattern, but don’t put everything in a single lakehouse and one workspace. Microsoft’s guidance recommends separate lakehouses per zone and (often) separate workspaces for better control, lifecycle management, and blast‑radius isolation—especially beyond a POC. [1]
-
Use OneLake Shortcuts rather than copying data across teams/domains to avoid duplication and keep permissions consistent. [2]
-
Publish Gold via Direct Lake semantic models (Power BI) to empower business users—with table optimization (V‑Order, compaction) to sustain performance. [3][4][5]
-
Governance: Organize by Domains (data‑mesh style), apply sensitivity labels and use Microsoft Purview for cataloging, lineage, and policy—so you get trusted, consistent data without creating new silos. [6][7][8][9]
1) Does a centralized lakehouse model align with Fabric best practices?
Conceptually yes (medallion & OneLake), but adjust the physical layout.
-
Medallion is the recommended pattern in Fabric to incrementally improve data quality (Bronze → Silver → Gold) and build a single source of truth. [1]
-
OneLake’s goal is “one logical lake” with one copy of data; use Shortcuts to reference data across workspaces/capacities/clouds without copying. This is key to breaking silos while avoiding edge copies. [2]
-
Physical design guidance: Microsoft recommends separate lakehouses per zone and states that while you can place all lakehouses in a single workspace, using separate workspaces gives you more control and governance at the zone level. This is the main tweak I’d make to your “one lakehouse & one workspace” plan. [1]
-
Organizational structure: Use Domains to reflect business areas (ERP/Finance, Logistics, Operations, etc.) and delegate certain governance settings down from tenant to domain admins (federated governance). [6][7]
-
For analytics consumption: Build Direct Lake semantic models over Gold tables for high‑perf self‑service BI without scheduled data imports. It’s explicitly called out as ideal for medallion Gold. [3]
What this looks like in practice for your POC
-
Workspaces:
-
Core-Data (Bronze)
– ingestion/landing (often via mirroring/ingest pipelines).
-
Core-Data (Silver)
– standardized/conformed.
-
Domain X (Gold)
– curated star schemas + semantic models for each domain (Finance, Logistics, Ops).
This gives you clean environment boundaries but still one logical lake via OneLake and shortcuts. [1][2]
2) Risks & limitations to be aware of
- Single workspace/lakehouse = governance & blast radius risk
- Workspace roles are coarse; putting everyone in one workspace increases permission sprawl and makes least‑privilege hard (e.g., a Contributor can modify all items in that workspace). Item‑level permissions exist, but they’re easier to reason about when items are scoped per zone/domain. [10][11][12]
- Lifecycle management & CI/CD realities
- Deployment pipelines & Git: Lakehouse containers deploy, but tables/files don’t automatically copy; guidance is to rebuild content in target via pipelines/notebooks after deploy (current platform behavior). Plan for this in Dev→Test→Prod. [13][14]
- Performance of Direct Lake depends on Delta health
-
To keep Gold fast and interactive: V‑Order, file compaction, and appropriate file sizes (avoid too many small files). These materially affect transcoding and query latency. [4]
-
Maintain Delta tables with Optimize + Vacuum; be mindful that shorter vacuum retention reduces time‑travel and must be explicitly configured. [5]
- Security model layering
-
Row‑level security (RLS) & Object‑level security (OLS) are applied at the semantic model; for Direct Lake, it’s recommended to use a fixed identity (sharable cloud connection) for predictable security behavior. [15][16]
-
Viewers can read via SQL analytics endpoint only if SQL access policy is granted; another reason to keep Gold distinct and governed. [12]
- Governance & compliance coverage
- Sensitivity labels apply to Fabric items and integrate with Purview; DLP is currently focused on semantic models—so treat the semantic model as the governed, certified entry point for business users. [9][8]
- Capacity management
- With many teams on one capacity, watch for throttling and contention. Install the Fabric Capacity Metrics app to monitor utilization and schedule heavy jobs off‑peak. [17]
- Real‑time vs batch
- If fleet tracking requires streaming, Eventstreams can ingest/transform/route events no‑code; consider isolating real‑time workloads from batch processing for reliability. [18]
- Source system sync
- For operational sources (ERP, logistics DBs), Mirroring into OneLake provides low‑latency replication (or metadata mirroring + shortcuts) without complex CDC pipelines—helpful during a POC to reduce engineering overhead. [19]
3) When are separate lakehouses/workspaces recommended?
-
Per medallion zone: Bronze, Silver, and Gold in distinct lakehouses (and often workspaces) to isolate permissions, jobs, data quality checks, and SLAs. [1]
-
By domain (data‑mesh): Finance, Logistics, Operations, etc. as Domains with their own workspaces; central team provides shared standards and platform. Delegated governance at the domain level. [6][7]
-
Environment separation: Dev / Test / Prod workspaces with deployment pipelines; remember to script/automate recreating lakehouse content in target stages. [13][14]
-
Capacity isolation or regulated data: Sensitive/regulated datasets (PII/PHI) placed in dedicated workspaces/capacities with stricter policies; share curated Gold via Shortcuts or semantic models to consumers. [10][2]
-
Streaming vs batch: Keep real‑time event processing separate from batch data engineering to decouple SLAs and failure domains. [18]
4) Suggested reference architecture for your 6‑month POC
A. Topology
B. Data flows
-
Bronze: land via Mirroring (for ERP/operational DBs) or pipelines; append‑only, immutable. [19]
-
Silver: standardize to Delta with business keys, data quality rules; OPTIMIZE + V‑Order on larger tables. [5]
-
Gold: dimensional models for reporting; Direct Lake semantic models with RLS/OLS as needed; prefer a fixed identity connection for Direct Lake models. [3][15][16]
C. CI/CD & environments
- Dev/Test/Prod deployment pipelines; after deploy, run orchestrated notebooks/pipelines to create target tables/shortcuts (lakehouse content doesn’t auto‑move). [13][14]
D. Monitoring & operations
- Capacity Metrics app + scheduling to avoid peak contention; implement regular table maintenance (Optimize, V‑Order, Vacuum). [17][5]
5) Guardrails & patterns that work well
-
Data contracts between zones:
-
Bronze = immutable raw (or mirrored).
-
Silver = conformed tables with clear SCD rules and SLAs.
-
Gold = only what business consumes (keep it slim). (General design recommendation)
-
Performance hygiene (Gold): compact files (~128MB–1GB), minimize small files, apply V‑Order, and avoid frequent small updates that bloat the Delta log. [5][4]
-
Security layering: workspace + item permissions + RLS/OLS at the semantic model; sensitivity labels on items. [10][16][9]
-
Self‑service enablement: publish Certified Direct Lake semantic models per domain; use Domains & the OneLake Catalog for discoverability. [3][6]
6) Example 6‑month POC plan (simple, outcome‑oriented)
Month 1–2 – Foundation & ingestion
-
Stand up Core-Data-Bronze
, Core-Data-Silver
, and one domain Gold workspace (e.g., Logistics).
-
Ingest ERP/logistics/production via Mirroring/pipelines to Bronze; define data contracts & naming conventions. [19]
Month 3 – Silver standardization & quality
-
Build conformed Silver tables (keys, SCD, dedup, standard units).
-
Add DQ checks; implement table maintenance jobs (Optimize/V‑Order/Vacuum). [5]
Month 4 – Gold & BI enablement
- Model first Direct Lake semantic model for Logistics; apply RLS/OLS; label items; publish a Certified dataset. [3][15][16][9]
Month 5 – CI/CD & scale‑out
- Set up deployment pipelines (Dev/Test/Prod), automate post‑deploy build of lakehouse content; onboard a second domain (e.g., Operations). [13][14]
Month 6 – Optimization & rollout
- Performance tuning (file sizes, partitions), capacity monitoring, schedule maintenance, and formalize a domain‑by‑domain rollout plan. [5][17]
7) “Lessons learned” patterns from multi‑department deployments
-
Central platform, federated domains works best: the core team runs Bronze/Silver and standards; domains own Gold and semantic models. This keeps autonomy where it matters while preserving a single source of truth. (Architecture pattern aligned with Domains guidance) [6][7]
-
Avoid “everything in one workspace.” It looks simpler early on but complicates permissions, CI/CD, and incident blast radius as more teams join. Use workspaces per zone/domain and glue it together with Shortcuts. [1][2]
-
Treat Direct Lake performance as an engineering discipline. Consistent V‑Order & compaction routines + thoughtful update patterns make the difference between import‑like performance and “mysterious slowness.” [4][5]
-
Plan for deployments now. Understand that lakehouse content doesn’t auto‑copy in pipelines; orchestrate builds in target environments after deployment. [13][14]
-
Make the Gold layer the governed doorway for business: certified datasets, sensitivity labels, and usage via the app—keep exploration “close to the model,” not to raw files. [9]
Quick checklist you can use this week
-
Split the POC into Core-Data-Bronze
, Core-Data-Silver
, and one Domain‑Gold workspace; wire with Shortcuts. [1][2]
-
Stand up Direct Lake model on one key Gold subject area; enable RLS with a fixed identity connection. [3][15]
-
Schedule Optimize/V‑Order/Vacuum for the top 5 largest tables. [5]
-
Enable Purview integration and sensitivity labels; certify the first domain dataset. [8][9]
-
Install the Capacity Metrics app and baseline consumption before more teams onboard. [17]
A couple of clarifiers to tailor this for you
- Rough daily volumes by source (ERP/logistics/production), and what fraction needs near‑real‑time vs hourly/daily?
- How many domains you plan to onboard during the POC versus post‑POC?
- Any regulated data (PII/financial) that warrants workspace/capacity isolation from day one?
- Current Fabric capacity size (F‑SKU) and typical Power BI concurrency?
If you share these with Data Mentor, it can probably turn the above into a concrete blueprint (naming conventions, folder/table layout, RLS design, deployment steps, and a minimal Fabric governance playbook) tailored to your org.
Hope this helps!
References
[1] Implement medallion lakehouse architecture in Fabric - Microsoft Fabric
[2] Unify data sources with OneLake shortcuts - Microsoft Fabric
[3] Direct Lake overview - Microsoft Fabric | Microsoft Learn
[4] Understand Direct Lake query performance - Microsoft Fabric
[5] Delta table maintenance in Microsoft Fabric - Microsoft Fabric
[6] Domains - Microsoft Fabric | Microsoft Learn
[7] Best practices for planning and creating domains in Microsoft Fabric
[8] Use Microsoft Purview to govern Microsoft Fabric
[9] Apply sensitivity labels to Fabric items - Microsoft Fabric
[10] Permission model - Microsoft Fabric | Microsoft Learn
[11] Roles in workspaces in Microsoft Fabric - Microsoft Fabric
[12] Workspace roles and permissions in lakehouse - Microsoft Fabric
[13] Lakehouse deployment pipelines and git integration (Preview)
[14] Solved: Deployment pipeline and lakehouse content - Microsoft Fabric …
[15] Manage Direct Lake semantic models - Microsoft Fabric
[16] Object-Level Security (OLS) with Power BI - Microsoft Fabric
[17] Install the Microsoft Fabric capacity metrics app
[18] Microsoft Fabric event streams overview - Microsoft Fabric
[19] Mirroring - Microsoft Fabric | Microsoft Learn