Suche

» erweiterte Suche » Sitemap

Technik

Pierre Lison

Robust Processing of Spoken Situated Dialogue

A Study in Human-Robot Interaction

ISBN: 978-3-8366-9113-0

Die Lieferung erfolgt nach 5 bis 8 Werktagen.

EUR 58,00Kostenloser Versand innerhalb Deutschlands


» Bild vergrößern
» Blick ins Buch
» weitere Bücher zum Thema


» Buch empfehlen
» Buch bewerten
Produktart: Buch
Verlag: Diplomica Verlag
Erscheinungsdatum: 03.2010
AuflagenNr.: 1
Seiten: 202
Abb.: 50
Sprache: Englisch
Einband: Paperback

Inhalt

Recent years have witnessed a surge of interest for service robots endowed with communicative abilities. Such robots could take care of routine tasks, in homes, offices, schools or hospitals, help disabled or mentally impaired persons, serve as social companions for the elderly, or simply entertain us. They would assist us in our daily life activities. These robots are, by definition, meant to be deployed in social environments, and their capacity to interact naturally with humans is thus a crucial factor. The development of such talking robots led to the emergence of a new research field, Human-Robot Interaction (HRI), which draws from a wide range of scientific disciplines such as artificial intelligence, robotics, linguistics and cognitive science. This work focuses on the issue of robust speech understanding - that is, how to process spoken dialogue automatically to extract the intended meaning. The book presents a new approach which combines linguistic resources with statistical techniques and context-sensitive interpretation to achieve both deep and robust spoken dialogue comprehension. The first part of the book provides a general introduction to the field of human-robot interaction and details the major linguistic properties of spoken dialogue, as well as some grammatical formalisms used to analyse them. The second part describes the approach itself, devoting one chapter to context-sensitive speech recognition for HRI, and one chapter to the robust parsing of spoken inputs via grammar relaxation and statistical parse selection. All the algorithms presented are fully implemented, and integrated as part of a distributed cognitive architecture for autonomous robots. A complete evaluation of our approach using Wizard-of-Oz experiments is also provided in this book. The results demonstrate very significant improvements in accuracy and robustness compared to the baseline.

Leseprobe

Chapter 6, Robust Parsing of Spoken Dialogue: We present in this chapter the approach we developed for the robust parsing of spoken inputs. After a general overview, we start by describing the grammar relaxation mechanism devised to parse slightly ill-formed or misrecognised utterances. We then go on to detail the discriminative model used to select the most likely parses among the ones allowed by the relaxed grammar. We explain what a discriminative model is, and how it can be applied to our task. We then describe the learning algorithm, the training data, and the various features on which the discriminative model operates. We conclude this chapter by explaining two interesting extensions of our approach. Parsing spoken inputs is a notoriously difficult task. The parser must be made robust to both speech recognition errors and ill-formed utterances, such as those including disfluencies (pauses, speech repairs, repetitions, corrections), ellipsis, fragments, a- or extra-grammatical constructions. Three broad families of techniques are generally used in the literature to tackle this problem: 1. The first family includes the large set of shallow or partial parsing techniques such as concept spotting’. In this approach, a small handcrafted, task-specific grammar is used to extract specific constituents and turn these into basic semantic concepts. These techniques are usually quite efficient, but are also highly domain-specific, fragile, and requires a lot of development and optimisation effort. 2. Statistical approaches are also widely used for robust parsing. They can take the form of either (1) at models derived from Hidden Markov Models, or (2) structured models relying on stochastic parsing. In both cases, the possible parses of a given utterance are computed based on the selection of the most probable optimal coverage. Pure statistical techniques have the advantage of being inherently robust, and can be trained automatically with annotated corpus data. Unfortunately, they are usually unable to deliver deep and detailed analysis, and have a large search space. And of course, they are only applicable as long as there is available training data (i.e. annotated corpora) for the task domain. 3. The third family of techniques relies on the controlled relaxation of grammar rules. Contrary to (pure) stochastic parsing, grammar relaxation approaches are able to provide deep syntactic analyses. They however require more development time to build up the necessary grammatical resources1. The relaxation mechanism must also be carefully controlled in order to avoid a combinatory explosion of the number of parses. The approach we present in this chapter belongs to the latter set of techniques, and contains several new improvements compared to the state of the art. It is based on a grammar relaxation mechanism coupled with a discriminative model selecting the most appropriate parse(s), a strategy we borrow from. The approach relies on a hybrid symbolic/ statistical architecture and integrates acoustic, semantic, syntactic and contextual knowledge into a unified model. The grammar relaxation mechanism is implemented by introducing nonstandards CCG rules that relax certain parts of the grammar, for example allowing for the insertion of missing words, the treatment of disfluencies, the combination of distinct discourse units, or the correction of common speech recognition errors. Grammar relaxation has the potential to significantly increase the grammar coverage, but at a cost: the multiplication of the number of alternative parses. A parse selection component (which we implement via a discriminative model ) is hence integrated into the system in order to discriminate the correct parses from the incorrect ones, by penalising’ to a correct extent the relaxed grammatical analyses. It is also worth noting that the integration of a parse selection component has the added advantage of associating an explicit score (or probability) to each parse. This score is a very worthy piece of information on its own, which can be used in various ways: for instance, it could be applied to trigger clarification requests, if it appears that no parse has a sufficient score (compared to a given threshold), or if several parses end up with a (quasi- )similar score. It can also be used during the incremental parsing to prune low-probability partial parses from the chart. The rest of the chapter is as follows. We first present the grammar relaxation mechanism. We then proceed with a description of the discriminative model, detailing its formal properties, the learning algorithm and the associated training data. We also describe the set of linguistic and contextual features on which the model operates. We finally present two possible extensions of our approach.

Über den Autor

Pierre Lison is a researcher at the German Research Centre for Artificial Intelligence in Saarbrücken, Germany. He holds a M.Sc. in Computer Science & Engineering from the University of Louvain (Belgium) and a M.Sc. in Computational Linguistics from the University of Saarland (Germany). He is currently involved in several international projects in cognitive robotics and human-robot interaction. Since 2009, he is also pursuing a Ph.D. on adaptive dialogue management for HRI, under the direction of Dr. ir. Geert-Jan M. Kruijff.

weitere Bücher zum Thema

Bewerten und kommentieren

Bitte füllen Sie alle mit * gekennzeichenten Felder aus.