Anyone who has ever seen an eDiscovery / eDisclosure demonstration will be aware of the Enron dataset, the very large collection of e-mail derived from the investigation into the collapse of Enron. It has been publicly available for some time, providing a test bed for a variety of search methodologies as well as an object-lesson in business practice.
Whilst it is hard to feel sympathy for the main players at Enron whose business practices were laid bare by this, it has recently emerged that personal information lay hidden in the dataset, including credit card numbers, social security or other identity numbers, dates of birth and other personal, medical, legal and contact information.There is a post here from BeyondRecognition about the discovery of all this personal information.
Investigative software provider Nuix, in conjunction with EDRM, have now republished the Enron PST Dataset after finding and removing more than 10,000 items of personal information from it. There is a press release about this exercise here, and an explanation here, with links to a case study and the cleaned up dataset.
I have a very recently published a story about the use of Nuix to investigate a very large source of information about concealed assets and financial transactions. Here is another example of Nuix’s processing power being applied to a major exercise for the public good.