Data deluge is a legal time bomb

Archaic search technology has turned e-disclosure into a corporate nightmare.

During a hearing last December, the defence team for three former Nortel Networks executives argued that its clients weren’t getting a crack at a fair trial. But a biased jury isn’t what’s standing in the way of justice in this closely watched fraud case. Rather, the culprit is shoddy technology.

The problem is the nearly 23 million pages of electronically stored disclosure handed over by the Crown. According to the defence, the sheer amount of material — the paper equivalent of some 8,000 to 10,000 banker’s boxes — is so “staggering” that it cannot be effectively searched for information that may help its clients. In a February ruling, Ontario Superior Court Justice Cary Boswell agreed, referring to the material as a “document dump” and “unsearchable morass.” The Crown had until April 1 to “re-disclose” any relevant material.

Long gone are the days when a crackerjack legal team would spend weeks sifting through the material for a single shred of exculpatory evidence. These days, companies rely on search technology to slice and dice electronic documents like a Ginsu knife. But while Google is great for keeping tabs on your ex, modern-day search engines are turning out to be a poor match for today’s data deluge. In fact, recent research conducted by the Text Retrieval Conference (TREC) Legal Track, an international workshop that assesses various information retrieval approaches, reveals that Boolean keyword searches found only between 22% and 57% of the total number of relevant documents.

“What’s accepted as standard practice is actually pretty archaic,” warns Gordon Cormack, co-ordinator of TREC Legal Track and a professor of computer science at the University of Waterloo. “It’s typically a keyword-oriented search.”

Cormack hopes TREC Legal Track’s research will lead to a more precise search protocol.This year, the workshop’s dozen or so academics, lawyers and techies will perform mock discovery requests using Enron litigation. Participants will search nearly 500,000 e-mail messages and 800,000 online documents for topics such as “stock transactions.” These digital findings are submitted to TREC, which then measures the percentage of relevant documents found using varying search methodologies.

Meanwhile, the corporate world is sitting on a digital time bomb. “Think of a large company that has 35,000 employees,” says Susan Wortzman, a partner at Wortzman Nickle, a Toronto law firm specializing in e-discovery. “We’re talking billions of e-mails.”

But that’s not all. The inherent ambiguity of language, the co-mingling of text and images, the inaccuracy of optical character recognition software — they’re all variables making it increasingly difficult for companies to comply with court-ordered requests for relevant documents or, more important, find that smoking gun beneath mounds of data.

Seizing the initiative, many companies are turning to so-called litigation readiness solutions. Waterloo’s Open Text, for example, offers a suite of content management tools that promises to rein in unwieldy data in anticipation of litigation. “A little bit of upfront investment saves a lot of headaches down the road,” says Eugene Roman, Open Text’s chief technology officer.