The promising research results achieved over the 90's in the area of human-machine dialogue had generated strong expectations regarding the imminent emergence of artificial agents able to engage into natural interactions with a human partner. However, more than ten years later, these so-called intelligent interactive systems are not yet in our everyday life. What really happened? What are the reasons for this discrepancy with respect to the initial roadmap? What was the actual content of the scientific advances compared to the requirements for effective operational systems? What are the technological breakthroughs yet to achieve and the underlying scientific challenges to address, especially in the field of knowledge processing, in order to bridge the gap toward interactive cognitive agents that meet end-user needs? This talk tackles such issues and provides some tentative answers.
Communications
The presentation of the communications of the CHIST-ERA Conference 2012 (keynote and short talks, posters) will be continuously updated in the course of June.
David Sadek is director of research of Institut Telecom (http://www.institut-telecom.fr/en_accueil.html). He is also chairman of the evaluation committee of the "Digital content and interactions" program of the French National Research Agency (ANR). Formerly, he was delegate director for research at Orange Labs. Doctor in computer science and specialist in artificial intelligence, knowledge technologies and cognitive science, he created and led for fifteen years research teams on human-computer dialogue and intelligent agents. He also conducted several programs of technology transfer and service deployment in these domains, and had been a major player in the standardization initiatives for software agents' technologies.
Keynote talk
Edouard Geoffrois has recently joined the French National Research Agency (ANR) as a program manager in the Department for Information and Communication Science and Technologies, while being also employed by the French National Defense Procurement Agency (DGA) in the division in charge of the Science and Innovation Policy. He is also the Technical Director of the Quaero program on multimedia content processing, a strategic research and innovation program supported by the funding agency under the Ministry of Industry (OSEO). After basic scientific studies at the Ecole Polytechnique, he specialized through a Master in Cognitive Science and a PhD on Automatic Speech Recognition at the University of Paris-Sud before joining DGA, where he conducted researches in speech and image processing and managed various projects in the same field.
Inferring new knowledge from data requires interpreting these data using complex mathematical models taking into account contextual information and background knowledge. The goal of research in knowledge processing is thus to find the best possible models for performing a given "intelligent" task. The quality of a model can be evaluated by measuring how it agrees with representative sample data sets for the task under study. Data is also important to train the models through automatic learning procedures. For the sake of reproducibility of experiments, the data used to test a model should be made publicly and easily available. Furthermore, to avoid any bias in comparing different models despite their learning ability, they should be tested before any experimental result is released, implying that all tests occur almost simultaneously. This leads to a specific organization, often called an evaluation campaign, where several research teams harmonize their needs and coordinate their efforts around a synchronized common test. Producing the needed data and organizing evaluation campaigns thus constitute critical resources for the success of the associated researches and must be granted much attention. Since they are not pure research activities, they should be carried out by dedicated teams, working at the service of the research community at large with a strong public support. Such resources are becoming a strategic component of any ambitious research effort in the field, and offer a natural means to coordinate work at the international level.
Keynote talk
Attachment | Size |
---|---|
Presentation Edouard Geoffrois.pdf | 327.16 KB |
Kalina Bontcheva is senior researcher in the Natural Language Processing Group, Department of Computer Science, University of Sheffield (http://nlp.shef.ac.uk). Her research interests include personalised summarisation of social media, mining information from patents, sentiment analysis, and collaborative environments for text annotation.
At present, 200 million Twitter users send 140 million tweets a day, Facebook has 750 million active users, who spend over 700 billion minutes per month on the site, and increasingly knowledge is generated utilising the "wisdom of the crowd", on Wikipedia, Quora, and other similar sites. This unprecedented rise in the volume and importance of online textual content has resulted in companies and individuals increasingly struggling with information overload, or, as Clay Shirky defines it - a filter failure.
This talk will discuss how text analytics and natural language processing can help address these issues, through the development of methods capable of extracting useful knowledge from noisy, contradictory content; inferring an individual's information seeking goals; offering personalised information access, and making use of distributed human computation, by harnessing the knowledge of a large number of humans. I will also touch upon the challenge of developing research infrastructures for experimentation with large-scale Text-to-Knowledge (T2K) analytics, at affordable costs for research teams and companies.
Keynote talk
Attachment | Size |
---|---|
Presentation Kalina Bontcheva.pdf | 1.13 MB |
Lori Lamel is a senior research scientist at the CNRS, which she joined as a permanent researcher at the Computer Science Laboratory for Mechanics and Engineering Sciences (LIMSI - http://www.limsi.fr/index.en.html). Her current activities fall in the areas of research in speaker-independent, large vocabulary continuous speech recognition; studies in acoustic-phonetics; lexical and phonological modeling; design, analysis, and realization of large speech corpora (TIMIT, BREF, TED); speaker and language identification.
This talk will highlight some open challenges in automatic processing of spoken language. Automatic speech processing has witnessed substantial advances over the last decade with growing interest in transcription systems for automatic structuring of audio supporting a variety of applications such as information archival and retrieval, media monitoring, automatic subtitling, question-answering, speech translation and speech analytics. Much of the information on the web is not in a textual format, and therefore will escape detection and categorization via text-based methods. Today's systems rely on large quantities of audio and textual data for model training, which often entail high development costs. Therefore research is needed on developing generic recognition models and automatic learning from unannotated data. Important future research will address keeping language models up-to-date, automatic topic identification, and enriched transcriptions providing annotations for speaker turns, language, acoustic conditions, etc. While the performance of speech recognition technology has improved dramatically for a number of 'dominant' languages, technologies for language and speech processing are available only for a small proportion of the world's languages. A major challenge is to have a wider coverage of languages, allowing all citizens to interact in their own native language. With more and more connectivity and mobility, a related challenge is dealing with language (code) switching.
Keynote talk
Attachment | Size |
---|---|
Presentation Lori Lamel.pdf | 652.19 KB |
Prof. Stefan Decker is professor at the National University of Ireland, Galway, director of the Digital Enterprise Research Institute (DERI - http://www.deri.ie/); and Cluster Leader of the Semantic Web Cluster within the institute. His research activities include semantics for access to information and documents, collaborative systems, the future Internet, Web 2.0, and distributed systems.
The Web has significantly changed society and is currently evolving into a global distributed database and knowledge base. This network of knowledge is now emerging globally in many areas: Government Data, Media, Businesses, as well as sciences among many others. The challenge of the Future will be to turn this data into knowledge and use it systematic and human-centric access to knowledge and solving today’s problems – on individual, organisational and global levels, to develop new solutions and enable innovation. I will present the current state and show several research results which exemplify the new possibilities in search, document retrieval, collaboration and sensor networks.
Keynote talk
Attachment | Size |
---|---|
Presentation Stefan Decker.pdf | 4.82 MB |
Dr. Tim Furche leads the DIADEM lab at the Oxford University (http://www.cs.ox.ac.uk/), UK, as senior postdoc. DIADEM is an ERC advanced investigator grant on web data extraction recently awarded to Georg Gottlob. His main research interests are information systems for the Web: linear evaluation of Web queries on graph data, reasoning, web extraction, web and object search, and streamed query evaluation.
Unsupervised, fully automated understanding of web pages designed for humans seemed an insurmountable challenge just a few years ago. In this talk, we will summarize resent advances in data extraction based on the combination of noisy, but large-scale entity recognition with high-level ontology constraints that have brought unsupervised data extraction at high accuracy within our grasp. In the DIADEM ERC project, this trend is being used to develop the first integrated system and methodology to design extraction ontologies that enable fully automated, high accuracy data extraction in a given domain. The talk is concluded by an outlook into how the increasing amounts of available structured knowledge can contribute in the future to bootstrap the extraction of further knowledge from unstructured sources.
Keynote talk
Attachment | Size |
---|---|
Presentation Tim Furche.pdf | 25.71 MB |
Víctor Maojo García is full professor and director at the Biomedical Informatics Group in the Artificial Intelligence Lab (UPM - Universidad Politécnica de Madrid - http://www.dia.fi.upm.es/index.php?page=news&hl=en_US) in Madrid. His main research lines are: artificial intelligence in medicine; computerized methods for protocols and clinical practice guidelines; biomedical ontology; methods for integration of clinical, -omics and nano-related information; text and data mining; and nanoinformatics.
Over the last decades, the explosion of biomedical data has led to different approaches for information extraction and knowledge discovery, using various data, text and web mining techniques. In the biomedical field, current approaches address a broad scope, ranging from public health to molecular biology. To manage such information, in the context of the semantic web, many biomedical ontologies have been created. However, we have pointed out elsewhere the limitations of current approaches adopted for biomedical ontologies. In my talk I will present current challenges of extending the current biomedical scope towards the nano level, including aspects linked to nanomedicine and nanotechnology. In addition, in this extended context, new ”visual” ontologies are needed to manage graphical and visual information (e.g., shapes, structures, forms). I will present the work that has been carried out in my group at the UPM in these topics, in the framework of various EC-funded projects, and current perspectives for future work.
Keynote talk
Attachment | Size |
---|---|
Presentation Victor Maojo Garcia.pdf | 4.76 MB |