Although never been pitted against IBM's Watson, DeepDive has gone up against a more fleshy foe: the human being. Result: DeepDive beat or at least equaled humans in the time it took to complete an arduous cataloging task. These were no ordinary humans, but expert human catalogers tackling the same task as DeepDive -- to read technical journal articles and catalog them by understanding their content.
"We tested DeepDive against humans performing the same tasks and DeepDive came out ahead or at least equaled the efforts of the humans," professor Shanan Peters, who supervised the testing, told EE Times.
DeepDive is free and open-source, which was the idea of its primary programmer, Christopher Re.
"We started out as part of a machine reading project funded by DARPA in which Watson also participated," Re, a professor at Univ. of Wisconsin told EE Times. "Watson is a question-answering engine (although now it seems to be much bigger). [In contrast] DeepDive's goal is to extract lots of structured data" from unstructured data sources.
DeepDive incorporates probability-based learning algorithms as well as open-source tools such as MADlib, Impala (from Oracle), and low-level techniques, such as Hogwild, some of which have also been included in Microsoft's Adam. To build DeepDive into your application, you should be familiar with SQL and Python.
"Underneath the covers, DeepDive is based on a probability model; this is a very principled, academic approach to build these systems, but the question for use was 'could it actually scale in practice'? Our biggest innovations in Deep Dive have to do with giving it this ability to scale," Re told us.
For the future, DeepDive aims to be proven to other domains.
"We hope go have similar results in those domains soon, but it’s too early to be very specific about our plans here," Re told us. "We use a RISC processor right now, we're trying to make a compiler and we think machine learning will let us make it much easier to program in the next generation of DeepDive. We also plan to get more data types into DeepDive: images, figures, tables, charts, spreadsheets -- a sort of 'Data Omnivore' to borrow a line from Oren Etzioni."
Get all the details in the free download which are going at 10,000 per week.