Don’t fire your most junior analyst yet: ChatGPT and its AI competitors suck at pulling data out of investor relations material, according to a study by Patronus AI, an outfit founded by two former Meta employees.

The trial: Patronus fed gobs of Securities and Exchange Commission filings into LLMs from OpenAI, Meta, and Anthropic and then asked each of them 150 basic questions. The bots brought back the wrong answer 68-81% of the time — they either failed to understand the question or hallucinated fake answers.

Even if using “long context windows” the bots still screwed up 21-24% of the time, and failed to respond in c. 4% of all cases.

SOUND SMART- Think of a context window as the bot’s memory — the amount of stuff it needs to keep straight in its “head” to answer a question. A “long” window can span multiple large documents, videos, or conversations. The longer the window, the richer the potential understanding of context the bot will have — and the better it will be able to reason and summarize data. The catch: Long windows are expensive, they’re hard to train bots on, and they still have high error rates — more data = more chance of hallucination unless you manage the LLM really tightly.

Part of the problem: General AI models are trained to be jacks of all trades — not Grand Wizzards of High Finance, says Anand Kannappan, Patronus’ CEO.

What’s Patronus’ pitch? The company bills itself as helping businesses learn to “use generative AI with confidence” by helping teams “detect LLM mistakes at scale.” It rolled out a product earlier this month called FinanceBench that bills itself as “the industry’s first benchmark for testing how LLMs perform on financial questions.”

And not all financial documents are created equal: A bot that learns, in late 2023, to understand publicly available SEC data will still face challenges if asked questions about transaction memos, pitch decks, and equity research, Kannappan says.

GO DEEPER- Check out Patronus AI’s website, read its press release on the study or read more on Fortune | CNBC.