Most BI projects fail not because the company chose the wrong dashboard tool. They fail because the underlying architecture was never designed properly. Data sits in silos that do not talk to each other. Metrics mean different things in different systems. Reports take hours to run. Someone finally gets a dashboard built, and within six months it is out of date and nobody trusts the numbers.
Good business intelligence starts with good architecture. This post breaks down what that architecture looks like, the decisions you have to make at each layer, and where most companies go wrong.
What BI Architecture Actually Means
Business intelligence architecture is the blueprint for how data moves through your organization — from where it originates, to where it is stored, to how it is transformed into something useful, to how it reaches the people who need it.
It is not a single tool or platform. It is a stack of interconnected components, each with a specific role. The quality of your BI output is limited by the weakest layer in that stack. A beautiful dashboard built on unreliable data is worse than no dashboard at all, because it creates false confidence in bad numbers.
The Layers of a BI Stack
Data Sources
Everything starts here. Your data sources are every system in your organization that generates information worth analyzing: your CRM, ERP, e-commerce platform, marketing tools, financial software, support ticketing system, and any external feeds you rely on.
The first architectural question is: what data do you actually need? The instinct is to connect everything. Resist it. Start by mapping which business questions you need to answer, then work backward to identify which data sources contain the raw material for those answers.
Connecting sources you do not have a plan for creates noise, maintenance overhead, and complexity that slows you down before you ever get to insights.
Integration and Ingestion
Raw data from multiple sources needs to be collected and brought into a central location. This is typically done through an ETL or ELT process.
ETL (Extract, Transform, Load) extracts data from sources, transforms it into a consistent format, and then loads it into storage. Transformation happens before the data lands in your warehouse.
ELT (Extract, Load, Transform) loads raw data first, then transforms it inside the data warehouse using the warehouse's own compute power. With modern cloud warehouses, ELT has become the more common pattern because it preserves raw data for reprocessing and leverages cheap compute rather than running transformations on separate infrastructure.
The practical tools here include dbt for transformations, Fivetran or Airbyte for managed connectors, and cloud-native services like AWS Glue or Azure Data Factory for custom pipelines.
One thing we consistently see with clients: the integration layer is where projects get underestimated. Connecting a source is often quick. Cleaning what comes out of it takes five times longer. Source systems are built to run operations, not to serve BI. They contain nulls, inconsistencies, duplicates, and formats that made sense at the time but are meaningless in aggregate.
Storage
Where your data lands depends on what you are doing with it.
Data warehouses (BigQuery, Snowflake, Redshift) store structured, processed data optimized for analytical queries. They are fast, reliable, and purpose-built for the kind of aggregations and joins that BI requires. For most companies running standard BI workloads, a cloud data warehouse is the right core storage layer.
Data lakes (S3, Azure Data Lake, Google Cloud Storage) store raw data in its original format — structured, semi-structured, and unstructured. They are cheap and flexible, which makes them good for storing everything before you know what you need. The tradeoff is that querying raw data in a lake requires more work than querying a structured warehouse.
Data lakehouses try to combine both: lake-scale storage with warehouse-quality query performance. Platforms like Databricks and Apache Iceberg are building toward this model. It is a reasonable direction for organizations with complex data requirements, but adds operational complexity most growing companies do not need yet.
For most companies with fewer than 500 employees: start with a data warehouse. Pick one, load your most important sources, and build from there. The marginal benefit of a data lake does not outweigh the complexity until you have data types and volumes that a warehouse cannot handle.
Data Modeling
This is the layer that most companies skip, and it is the most important one to get right.
Data modeling is the process of organizing raw data into structures that reflect your business — dimensions, facts, metrics, hierarchies. A well-designed data model means everyone in your organization is working from the same definition of "revenue," "customer," and "active user." Without it, the finance team's revenue number and the sales team's revenue number will never match, and every meeting becomes an argument about whose spreadsheet is right.
The most common approach is the star schema: a central fact table (orders, sessions, transactions) surrounded by dimension tables (customers, products, dates, locations). It is not glamorous, but it is durable and it works.
Tools like dbt formalize this layer by letting you write data models as version-controlled SQL with built-in testing and documentation. If you are building a serious BI architecture, dbt or something like it belongs in the stack.
Semantic and Business Logic Layer
Above the data model sits the semantic layer: the business rules, calculated metrics, and definitions that translate raw data into business meaning.
This is where you define that "active customer" means a customer who made a purchase in the last 90 days, not anyone who ever created an account. Where you specify that "gross margin" is revenue minus cost of goods, excluding shipping. Where you encode the rules that make metrics consistent across every report, every tool, and every user.
When the semantic layer is missing or inconsistent, the same underlying data produces different numbers in different tools. People stop trusting the data and go back to spreadsheets.
Modern BI tools like Looker (which was built around this concept) and platforms like Cube.js or AtScale offer a dedicated semantic layer that sits between the warehouse and visualization tools. For simpler stacks, you can encode this logic in your dbt models. Either way, the logic needs to live somewhere central and authoritative.
Presentation and Visualization
This is the layer most people think of when they think of BI: dashboards, charts, and reports. Tableau, Power BI, Looker, Metabase, and Redash all live here.
The choice of visualization tool matters less than people think. What matters is whether the tool fits how your team actually works:
- Self-service or analyst-driven? Tools like Tableau and Power BI are powerful but require training. Metabase and Google Looker Studio are faster to get working and accessible to non-technical users with less setup.
- Embedded or standalone? If you want analytics inside a product your customers use, you need an embeddable option like Metabase Embedded, Redash, or a purpose-built solution.
- SQL-native or GUI-first? Technical teams often prefer tools that let them write SQL directly. Business users need drag-and-drop or guided exploration.
The visualization layer is also where you enforce access controls: who can see which data, what rows they can query, and what they can export. This is not optional if you have sensitive financial or customer data.
Common Architectural Mistakes
Building the presentation layer first. Companies buy Tableau, point it at raw production databases, and wonder why reports take 10 minutes to run and the numbers never match. The visualization layer is the last piece, not the first.
Skipping data modeling. Without a defined data model, every report becomes custom work. There is no reuse, no consistency, and no single source of truth.
Connecting too many sources too soon. Twenty data sources with unreliable pipelines is worse than five data sources with clean, tested data. Quality over quantity.
Treating BI as a project rather than an ongoing capability. Architecture decisions made in year one will need revisiting as your data grows, your business changes, and your tools improve. Budget for maintenance and iteration, not just initial build.
Ignoring data quality monitoring. Your warehouse has bad data. All warehouses have bad data. The question is whether you find it before or after it corrupts a report that drives a major decision. Add automated tests to your pipeline from the beginning.
Build vs. Buy vs. Managed
For most growing companies, the question is not which architecture to design from scratch — it is how much of the stack to build yourself versus buy as a service.
The integration layer is almost always worth buying. Fivetran and Airbyte maintain hundreds of connectors so you do not have to. A custom-built connector to Salesforce or Shopify is a maintenance burden that never goes away.
The data warehouse is a managed service. Nobody runs Redshift or Snowflake infrastructure themselves.
Data modeling and the semantic layer are where custom work earns its keep. These are specific to your business rules, your metrics, and your definitions. No off-the-shelf tool can model your business for you.
The visualization layer is mixed. Standard dashboards are easily handled by existing tools. Embedded analytics or highly customized reporting interfaces often require custom development on top of or alongside BI platforms.
Where to Start
If you are starting from scratch, the fastest path to useful BI looks like this:
- Define three to five questions your leadership team cannot currently answer quickly. These are your initial use cases.
- Identify which data sources contain the information needed to answer those questions.
- Set up a cloud data warehouse and load those sources. BigQuery has a generous free tier. Snowflake and Redshift are straightforward to start on.
- Build a minimal data model with dbt or raw SQL that produces clean, tested tables for your use cases.
- Connect a visualization tool and build the dashboards for your initial questions.
- Iterate. Once the first use cases are working and trusted, expand from there.
The goal for the first three months is not a comprehensive BI platform. It is two or three dashboards that people actually use and trust. That is the foundation everything else builds on.
The Architecture Is Not the Point
The architecture exists to serve decisions. A well-designed BI stack is invisible to the people who use it — they just see fast, accurate, consistent data when they need it. The complexity lives in the engineering and setup, not in the day-to-day experience.
Building that foundation properly takes more upfront effort than connecting a dashboard to a database and calling it done. But the payoff is a BI environment that scales, that people trust, and that actually changes how your organization makes decisions.



