Why the BI team is the bottleneck
In most risk-bearing organizations, the same shape of conversation happens every week. A director on the quality team has a simple question. "Which of our IPA clinics are below 80% on diabetes PDC, and how many members do they cover?" In their mind it is one question. In practice it requires pulling pharmacy fill data from one source, member-to-clinic attribution from another, contract structure from a third, and an aggregation that nobody on the team can write themselves.
So it becomes a ticket. The ticket joins a queue. Three weeks later, an answer comes back. By then, the practice meeting it was for has happened, the question has shifted, and the answer is filed in a Slack thread that no one looks at again.
This is not the BI team's fault. The data lives in 8 to 15 disconnected systems, with conflicting member identifiers, conflicting time windows, and conflicting definitions of basic metrics. Every question that crosses two of those systems is a small data-engineering project. There is no version of "more dashboards" that solves it. The dashboards still ask the BI team for the underlying joins.
What a natural-language analytics layer actually does
Stripped of marketing language, it does this. You ask a question in English. The system translates the question into a query against a canonical data model. The query runs. The answer comes back with the SQL it ran, the data sources it touched, the time ranges it used, and the row counts at each step.
If the answer surprises you, you click into it. You can see the underlying members, the join logic, and the definition of every metric in the question. If you disagree with the definition (and you often will, the first few times), you can edit the metric definition once and have every future question use the corrected one.
This is not magic. It is a thin layer on top of a properly built canonical record. The data model is the product. The language layer is the part that makes it accessible.
Why this works only on a canonical record
Most healthcare data warehouses are not warehouses. They are landing zones with conflicting copies of the same entities. A member exists three times under different IDs. A claim line appears in two states. Provider attribution depends on which contract you read.
Pointing a language model at that mess produces confident answers that are wrong, because the model will pick one of the conflicting joins and not tell you it picked. The model is not the problem. The data is. Until you have one canonical record per member, per provider, per contract, with documented resolution rules, no amount of language tooling is going to be trustworthy.
This is why the analyst layer is the last thing we built at Pelica, not the first. The harder work was the canonical record across claims, EHR, pharmacy, lab, and ADT, with conflict resolution for every join.
From question to answer, with provenance
The most common failure mode of natural-language analytics is confident wrong answers. The solution is to make provenance non-optional on the way back, not on demand.
For every answer, the system shows:
- The SQL it ran. Not a description, the actual query. Engineers and analysts who want to verify can read it directly. The SQL is also editable, in case the user wants to tweak the question instead of rephrasing.
- The data sources and time ranges. Which tables, which date filters, which contract scope. Healthcare metrics often diverge by whether you used encounter date or paid date; that has to be explicit.
- The metric definitions in use. What counts as "adherent." What counts as a "high-risk" member. What counts as a "controlled" diabetic. Every term in the question maps to a stored definition that the user can click into.
- Row counts at each step. Members at the start of the filter, after attribution, after contract scoping, in the final answer. If 12,000 members became 41 between step 2 and step 3, something is interesting about step 3.
- The member list itself. Click through to the underlying members behind the count. If you cannot reproduce the answer by looking at the members, the answer is not actionable.
This is the part that distinguishes a real analyst layer from a chat interface bolted onto a dashboard. An answer without provenance is not an answer in healthcare. It is a guess that happens to have a number attached.
"In healthcare AI, the expensive layer is the one that stitches the patient story together before the model ever says a word. The conversation layer is the easy part."
What stays the BI team's job
The whole point is not to replace the BI team. It is to move them off the repetitive work that should never have been a ticket.
The natural-language layer absorbs the questions that are well-served by an existing data model and a metric that is already defined. Pull this measure for this contract. Show this member's claims by quarter. Count members at risk under V28. The kind of thing a director should be able to ask and act on the same hour, not put on a queue.
What stays the analyst's job is anything that requires:
- New data integration. Pulling in a payer file the system has not seen before. Reconciling a new SFTP drop. Engineering, not querying.
- New metric definition. The first time someone asks "how do we count adherence under the 2026 single-weight rule," that requires deciding on the definition. Once it is defined, every future question uses it.
- Non-trivial reconciliation. When two systems disagree on the count, an analyst has to figure out which is right. That is judgment work.
- Custom reporting for payers or regulators. A RADV response cannot be a chat. It is a formal report that requires curation.
- Strategic analysis. Forecasting, scenario planning, segmentation work that exists to support a business decision, not to answer a single question.
Total ticket volume usually drops dramatically. The remaining work is higher value, harder, and more interesting. The BI team gets to do the analyst job they trained for, not the data-pulling job that consumed every Tuesday morning.
How to introduce it without breaking trust
If you are a CIO or BI lead considering this, the failure mode to avoid is shipping a chat box and letting operators ask anything. The first wrong answer that gets cited in a meeting will set the project back six months.
What works better:
- Start with a defined scope of questions. Pick 25 to 50 questions the team currently asks weekly. Make sure the system answers all of them correctly, with provenance, before you open it more broadly.
- Ship to power users first. Two or three directors who will catch wrong answers fast and tell you. The BI team should be the first reviewers, not the gatekeepers.
- Make the audit trail visible by default. SQL, sources, definitions, members. Not behind a "show details" link. If they have to ask, they will not look, and they will stop trusting the layer.
- Treat metric definitions as version-controlled. Every metric in the canonical model is owned by someone. Changes go through review. Otherwise, the same question returns different answers on different days, and the trust collapses.
- Track the queue. Whether ticket volume drops in the BI queue is the cleanest indicator of whether the layer is being used. Whether the questions left in the queue are higher-value is the indicator of whether it is working.