Predictive coding at DESI V, the Oracle-EDI Study and other TAR sources

The Fifth DESI Workshop on Standards for Using Predictive Coding, Machine Learning, and Other Advanced Search and Review Methods in E‐Discovery takes place in Rome on 14 June. The Oracle-EDI Study on Predictive Coding will be published at the EDI Summit on 15-17 October. Time for a round-up of some of the predictive coding resources.

My aim here is to point you to a handful of papers, posts and articles which cover the predictive coding / technology-assisted review ground. If you are interested in reading your way into them, then the documents given here, plus their own links, will point you to almost everything worth reading. If what you want is a quick crib so that you go into the predictive coding demo, client meeting or discussion with opponents with a few references under your belt, then a skim of some of these will serve.

I can be sure that the sources given here plus those which they link to are pretty comprehensive because the first on my list is Rob Robinson’s deliberately detailed summary of writings about what he calls technology-assisted review (I put it like that because not the least of the problems with this subject is the failure to agree on what to call it – that does not matters to those in the know, but it is something of a barrier to new entrants).

ComplexDiscoveryRob Robinson’s summary is called Technology-Assisted Review: From Expert Explanations to Mainstream Mentions. It lists articles of all kinds in reverse order of date from February 2012 when US Magistrate Judge Andrew Peck gave his Opinion in Da Silva Moore. I have not counted them (though I was vain enough to see that 24 of my own articles appear on the lists) but you can be sure that anything written on the subject which is worth reading is here.

Rob RobinsoEquivion has also recently updated his Got Technology-Assisted Review? A Short List of Providers and Terms which complements his list of sources. For the shortest and most easily-understood recital of predictive coding functions, see the description by text analysis software provider Equivio of its predictive coding application Equivio Relevance – a model of succinct explanation by a company whose appreciation that “less is more” applies to its marketing materials as well as to its mission to eliminate data redundancy.

What actually set me writing this article, and what gave it the date range in its title, are a paper about Blair & Maron and the forthcoming Oracle–EDI study on predictive coding. Blair and Maron (like others, I use “and” for the people and “&” for their paper) wrote the 1985 paper which has been used ever since to support the argument that key words are not a satisfactory means of searching electronic data, yielding (so it is said) poor accuracy rates relative to modern technology like technology-assisted review.

[You will note that I tread delicately here, avoiding numbers and deliberately sticking to broad, vague terms like "accuracy rates" - I am steering you towards a discussion, not taking sides in it].

iDSThe paper, called A Re-Examination of Blair & Maron (1985), is written by Dan Regard and Tom Matzen of iDiscovery Solutions, Inc. for the Fifth DESI Workshop on Standards for Using Predictive Coding, Machine Learning, and Other Advanced Search and Review Methods in E‐Discovery which takes place in Rome on 14 June. At only 18 pages, this paper is a succinct and readable summary whose purpose (as I read it) is to urge lawyers and others concerned with the accuracy of search to consider all “techniques, technologies and processes” (as the authors put it) and to review and criticise them “on the merits of the actual results”. Quite apart from its own merits, the paper has extensive cross-links to other sources, making it a jumping off point for comprehensive research into the subject.

Technology, and the thinking around its use, has moved on in 30 years, developing abilities unthought of when Blair and Maron were writing. However, as the authors put it, “you still have to exercise that ability” and they quote from the Sedona Conference Best Practices Commentary on the use of such an information retrieval methods in eDiscovery:

There is no magic to science search and retrieval: any mathematics, linguistics, and hard work.

XeroxThis point is covered in another DESI paper called Variability in Technology Assisted Review and Implications for Standards by Amanda Jones and Jianlin Cheng of Xerox Litigation Services. Its theme is the extent to which aspects of the TAR process should be standardised and how. Lawyers ask constantly how they are to “trust” the results from a TAR process, as if the inputs and outputs were somehow nothing to do with them – a “black box” has arrogated that role, they claim. The Xerox article will show them that it is science and thoughtfulness, not the chance operation of an algorithm, which dictates both the accuracy of a TAR exercise and the ability to validate the results. There is a Xerox article about this paper here.

I had hoped to go to DESI in Rome, and reading this paper increases my regret that I cannot do so.  You might ask why I picked on these two papers from the many interesting titles which appear on the DESI list. The answer is a pragmatic one – someone bothered to send them to me in good time so I had the chance to read them.

How do we judge “the merits of the actual results” of an eDiscovery / eDisclosure search as Dan Regard and Tom Matzen put it? What is required of lawyers varies from jurisdiction to jurisdiction, and rule changes actual (in England and Wales) and pending (in the US) will open new areas of discussion, particularly thanks to the express focus on proportionality in both jurisdictions. The study of “actual results” is what is planned in the pending Oracle-EDI study on predictive coding.

That first came to my attention in an article headed Updates to Oracle EDI Study on Predictive Coding which I have mentioned in a previous article – an LTN report of a seminar in early May sponsored by Equivio and featuring a progress report from Pallab Chakraborty, Director of eDiscovery at Oracle.

EDISummitThe article refers to a database provided by Oracle which several vendors are using to test human review against technology-assisted review. The study focuses on how the predictive coding process performs against traditional linear document review. Results are due to be delivered by the EDI Leadership Summit in Santa Monica on October 15-17. There is an EDI press release from January here.

If we are lucky, the result will combine serious academic study with explanations couched in terms which lawyers (who, on the whole, are not good at understanding or articulating arguments derived from science) will understand.


