An AI executive narrative is only useful if the numbers in it match the numbers in the underlying data. Most LLM-driven finance dashboards skip this check. Fin4Sight doesn't.
The promise of AI narratives in finance dashboards is straightforward: instead of reading 12 charts, you read a paragraph that explains what changed. The trouble is what happens when the model is confident and wrong.
LLMs predict the next token based on the context they're given. If you ask one to summarize “AP aging exceeded last month's by 23%”, the model will write a fluent paragraph around that figure even if the actual number is 11%. The fluency is the problem — the narrative reads as though the number is grounded, but the number was just the most plausible-looking next token.
In a CFO dashboard, that kind of fluency is dangerous. A 23% claim where the truth is 11% gets quoted in the next board meeting. Auditors flag it later. Trust in the dashboard drops.
Fin4Sight handles this with a guardrail that runs after the LLM generates a narrative and before the narrative ships to the dashboard. Three steps:
The user never sees a hallucinated figure. They either see a narrative whose numbers all reconcile to the underlying data, or they see no narrative.
The same guardrail pattern runs across the modules that generate LLM narratives:
Every quantitative claim in any of these narratives goes through the validation pass. One pattern, applied consistently.
5% is the rounding threshold the platform uses today. It's tight enough to catch fabrication and loose enough to allow normal LLM rounding behaviour (“approximately 25%” instead of “24.7%”). Tighter thresholds reject too many narratives that would have been useful; looser thresholds let through too many that wouldn't have been.
The threshold is configurable per tenant if you want stricter validation for an audit-sensitive workflow. The default is 5%.
For the CFO: the narrative numbers always tie to the dashboard charts. No more “wait, that figure isn't right” in a board read-through.
For the auditor: every figure in an LLM-generated report is provably tied to a source aggregate. Open the report, open the source data, the numbers reconcile.
For the AP team: variance commentary, anomaly explanations, and reconciliation summaries all come with numbers you can trust without re-checking them.
It's hard. Extracting numbers reliably from LLM output isn't a one-liner — you need to handle currencies, scales (thousands vs. millions), percentages, and deltas. Recomputing the source aggregate means having a clean view of the real data the LLM was given. Rejecting narratives means fewer narratives ship, which makes the dashboard look quieter.
It's also less impressive in a demo. A dashboard that ships every narrative looks more capable than one that rejects 10% of them. Until the rejected narrative is the one that would have lied.
Ask the vendor what happens when the LLM gets a number wrong. The good answers describe a validation pass and a reject behaviour. The bad answers describe how unlikely it is for the model to get a number wrong, which means the vendor hasn't built a guardrail and is hoping you won't notice.
Hallucinated KPIs are a discipline problem, not a model problem. Models will hallucinate; the question is what your tool does about it.