Equivio continues its series of educational webinars with the global title Predictive Coding Minus the Hype. The sixth webinar in the series is called Advanced document clustering for eDiscovery and takes place on 16 December at 12:00pm ET. It is given by Equivio’s Avi Elkoni.
Put as broadly as possible, clustering is technology which groups together documents which have similar contents and other characteristics. The user can drill down into each cluster to make decisions about the group as a whole or about individual documents and can use “find similar” functionality to find others.
The EDRM definition is:
An Unsupervised Learning method in which Documents are segregated into categories or groups so that the Documents in any group are more similar to one another than to those in other groups. Clustering involves no human intervention, and the resulting categories may or may not reflect distinctions that are valuable for the purpose of a search or review effort.
At an early stage in the eDiscovery / eDisclosure process, clustering is great for developing an overview – one of the characteristics of eDiscovery is that you often have no idea what is in the collection, and clustering helps with this; at a later stage, when, perhaps, a technology like predictive coding has been used to narrow the collection, clustering is useful for identifying classes of documents which should not be there – non-relevant documents in the relevant pile and vice versa.
Like predictive coding, clustering is a tool which can be used in conjunction with others, at least by those who understand what it can and cannot do. The webinar is advertised as being “a must for everyone who uses text analysis tools in eDiscovery”. I would go further and say that it is a must for those who need to understand what each of the different technology types can achieve to in order to inform future decision-making about the choice of tools.