Dr. Tim Furche leads the DIADEM lab at the Oxford University (http://www.cs.ox.ac.uk/), UK, as senior postdoc. DIADEM is an ERC advanced investigator grant on web data extraction recently awarded to Georg Gottlob. His main research interests are information systems for the Web: linear evaluation of Web queries on graph data, reasoning, web extraction, web and object search, and streamed query evaluation.
Unsupervised, fully automated understanding of web pages designed for humans seemed an insurmountable challenge just a few years ago. In this talk, we will summarize resent advances in data extraction based on the combination of noisy, but large-scale entity recognition with high-level ontology constraints that have brought unsupervised data extraction at high accuracy within our grasp. In the DIADEM ERC project, this trend is being used to develop the first integrated system and methodology to design extraction ontologies that enable fully automated, high accuracy data extraction in a given domain. The talk is concluded by an outlook into how the increasing amounts of available structured knowledge can contribute in the future to bootstrap the extraction of further knowledge from unstructured sources.