DYCI2: Overview

The project Creative Dynamics of Improvised Interaction focuses on conceiving, adapting, and bringing into play efficient models of artificial listening, learning, interaction, and generation of musical contents. It aims at developing creative and autonomous digital musical agents able to take part in various human projects in an interactive and artistically credible way; and, to the end, at contributing to the perceptive and communicational skills of embedded artificial intelligence. The concerned areas are live performance, production, pedagogy, and active listening.

Read more (french) …

Hide …

Le projet met en avant l'interaction improvisée, à la fois comme modèle cognitivement inspiré de l'action et de la décision individuelle et collective, comme schéma de découverte et d'apprentissage non supervisé, et comme outil discursif pour l'échange humain - agent numérique, dans une perspective de modélisation du style et de l'interaction.

L’interaction improvisée entre des humains et des agents numériques est un domaine récent issu des études sur la créativité artificielle, qui repose sur l’observation que l’immense majorité des interactions humaines sont improvisées, et convoque plusieurs problématiques de recherche très actives : l’apprentissage interactif, dont les modèles se construisent dans le temps même de l’interaction et dont les résultats infléchissent les conditions de cette interaction; la perception artificielle, base de cette interaction; la modélisation de l’interaction sociale et expressive entre agents humains et/ou numériques, dans ses dimensions à la fois anthropologique, sociale, linguistique, et informatique. Ce type d’interaction met en jeu la boucle perception / action et engage l’apprentissage dans une conception renouvelée où l’agent apprend notamment par les réactions d’autres agents à ses propres productions. Le modèle d’apprentissage intégrant la démarche d’improvisation est donc aussi bien génératif et réflexif.

Intégrer écoute artificielle, apprentissage de comportements musicaux, modélisation temporelle des structures musicales et dynamiques d’interaction créative dans une architecture permettant l'expérimentation effective en temps réel constitue un défi ambitieux de la sociéte de l’information et de la communication, riche de nombreuses applications potentielles susceptibles de changer la donne dans la relation entre humains et agents artificiels créatifs dans le contexte des industries culturelles, notamment pour la production et post-production audiovisuelles et musicales, le jeu vidéo, le spectacle vivant, les formats innovants renouvelant la diffusion et l’écoute active de la musique, les nouvelles formes narratives basées sur l’interaction.

Le projet articule entre elles trois grandes problématiques de recherche autour d’un environnement logiciel expérimental en tirant au mieux parti de l’expertise des partenaires et de leurs interactions. Ces trois thèmes, pour chacun desquels au moins deux des partenaires collaborent et co-encadrent des travaux correspondent aux principales compétences exercées de manière parallèle, compétitive et contributive, par un agent numérique « créatif » en situation d’interaction improvisée avec des humains et d’autres agents : l'écoute informée visant à analyser la scène sonore pour extrapoler la structure musicale en exploitant les similarités observées et les connaissances a priori disponibles ; l'apprentissage adaptatif de structures musicales visant à intégrer modélisation de séquences formelles et approches probabilistes pour rendre mieux compte de la complexité du discours musical à partir de données nécessairement limitées ; et les dynamiques d’interaction improvisée permettant d’envisager les architectures multi-agents et les modèles de connaissance et de décision permettant de mettre concrètement en jeu les scénarios de co-improvisation impliquant acteurs humains et numériques.

Scientific Program

The project is organized around three main tasks :

creative informed listening (resp. Sylvain Marchand, UBO)
interactive learning of musical structures (resp. Emmanuel Vincent, Inria)
dynamics of improvised interactions (resp. Gérard Assayag, Ircam)

Read more (french) …

Hide …

l’écoute informée créative (resp. Sylvain Marchand, UBO)

vise à donner les moyens à un agent numérique d’analyser les scènes sonores en temps réel pour extrapoler la structure musicale en exploitant les similarités observées et des connaissances disponibles a priori. Cette recherche se situe dans le cadre de la «décomposition musicale informée» et souhaite dépasser l’état de l’art en partant d'une scène sonore complexe vivante, par exemple produite dans le cas des musiques mixtes ou improvisées en direct. De l'information a priori est toujours disponible : partition des parties écrites, prises de son lors de répétitions, annotations, et il est possible d’inférer aussi une information partielle à partir de l’apprentissage de corpus stylistiquement proches. Il s’agit donc de retrouver la structure musicale, notamment dans sa décomposition polyphonique, en direct.

l’apprentissage interactif de structures musicales (resp. Emmanuel Vincent, Inria)

vise, à partir des données séquentielles fournies par les processus d’écoute, à apprendre des modèles symboliques susceptibles de capturer les structures musicales de haut niveau, multi-dimensionnelles et multi- échelles émergeant dans un certain contexte de performance musicale et d’improvisation. Nous proposons une approche nouvelle consistant en l'intégration entre modélisation de séquences formelles (plus spécifiquement, l’oracle des facteurs) et approches bayésiennes : le développement de modèles et d'algorithmes rendant mieux compte de la complexité du discours musical pose le problème d'apprendre de tels modèles à partir d'une quantité de données nécessairement limitée dans le scénario visé. L'approche bayésienne offre une solution théorique à ce problème dont l'efficacité a été largement démontrée pour la modélisation du langage parlé. L'intégration entre modèles de séquences formelles et approches bayésiennes dans un contexte musical offre de nombreuses perspectives non explorées à ce jour. Le défi sera en particulier de maintenir l'efficacité calculatoire des premiers tout en bénéficiant de la robustesse au sur-apprentissage des secondes.

les dynamiques d’interaction improvisée (resp. Gérard Assayag, Ircam)

permettent une interaction improvisée riche et créative entre agents artificiels et humains et posent les questions de l’adaptation temporelle et collective de l’interaction à plusieurs échelles. Cette tâche profite de l’écoute informée et des capacités d’analyse et d’apprentissage des deux premières en articulant un système adaptatif et anticipatif de gestion de l’interaction improvisée en se posant la question du modèle de mémoire, de connaissance et de contrôle interne des agents. Il s’agit de dépasser les approches classiques statiques et déterminées préalablement et de pouvoir adapter en temps-réel les modèles, représentations et modalités d’apprentissage de l’interaction, en prenant en compte ses différentes échelles temporelles et les dynamiques collectives susceptibles d’y être attachées.

Document scientifique complet de DYCI2

Scientific life

DYCI2 is structured around a significant number of PhDs and postdocs. The websites or blogs of the PhD students and post-doctoral researchers are one of the living indicators of the scientific life of the project.

Jérôme Nika - (Download the thesis)

Abstract…

Hide …

Guiding human-computer music improvisation: introducing authoring and control with temporal scenarios

Ircam - EHESS

This thesis focuses on the introduction of authoring and controls in human-computer music improvisation through the use of temporal scenarios to guide or compose interactive performances, and addresses the dialectic between planning and reactivity in interactive music systems dedicated to improvisation.

An interactive system dedicated to music improvisation generates music on the fly, in relation to the musical context of a live performance. This work follows on researches on machine improvisation seen as the navigation through a musical memory: typically the music played by an analog musician co-improvising with the system during a performance or an offline corpus. These researches were mainly dedicated to free improvisation, and we focus here on pulsed and idiomatic music.

Within an idiomatic context, an improviser deals with issues of acceptability regarding the stylistic norms and aesthetic values implicitly carried by the musical idiom. This is also the case for an interactive music system that would like to play jazz, blues, or rock… without being limited to imperative rules that would not allow any kind of transgression or digression. Various repertoires of improvised music rely on a formalized and temporally structured object, for example a harmonic progression in jazz improvisation. The same way, the models and architecture we developed rely on a formal temporal structure. This structure does not carry the narrative dimension of the improvisation, that is its fundamentally aesthetic and non-explicit evolution, but is a sequence of formalized constraints for the machine improvisation. This thesis thus presents: a music generation model guided by a scenario introducing mechanisms of anticipation; a framework to compose improvised interactive performances at the scenario level; an architecture combining anticipatory behavior with reactivity using mixed static/dynamic scheduling techniques; an audio rendering module to perform live re-injection of captured material in synchrony with a non-metronomic beat; a study carried out with ten musicians through performances, work sessions, listening sessions and interviews.

First, we propose a music generation model guided by a formal structure. In this framework improvising means navigating through an indexed memory to collect some contiguous or disconnected sequences matching the successive parts of a scenario guiding the improvisation (for example a chord progression). The musical purpose of the scenario is to ensure the conformity of the improvisations generated by the machine to the idiom it carries, and to introduce anticipation mechanisms in the generation process, by analogy with a musician anticipating the resolution of a harmonic progression.

Using the formal genericity of the couple scenario / memory, we sketch a protocol to compose improvisation sessions at the scenario level. Defining scenarios described using audio-musical descriptors or any user-defined alphabet can lead to approach others dimensions of guided interactive improvisation. In this framework, musicians for whom the definition of a musical alphabet and the design of scenarios for improvisation is part of the creative process can be involved upstream, in the meta-level of composition consisting in the design of the musical language of the machine.

This model can be used in a compositional workflow and is offline in the sense that one run produces a whole timed and structured musical gesture satisfying the designed scenario that will then be unfolded through time during performance. We present then a dynamic architecture embedding such generation processes with formal specifications in order to combine anticipation and reactivity in a context of guided improvisation. In this context, a reaction of the system to the external environment, such as control interfaces or live players input, cannot only be seen as a spontaneous instant response. Indeed, it has to take advantage of the knowledge of this temporal structure to benefit from anticipatory behavior. A reaction can be considered as a revision of mid-term anticipations, musical sequences previously generated by the system ahead of the time of the performance, in the light of new events or controls. To cope with the issue of combining long-term planning and reactivity, we therefore propose to model guided improvisation as dynamic calls to compositional processes, that it to say to embed intrinsically offline generation models in a reactive architecture. In order to be able to play with the musicians, and with the sound of the musicians, this architecture includes a novel audio rendering module that enables to improvise by re-injecting live audio material (processed and transformed online to match the scenario) in synchrony with a non-metronomic fluctuating pulse.

Finally, this work fully integrated the results of frequent interactions with expert musicians to the iterative design of the models and architectures. These latter are implemented in the interactive music system ImproteK, one of the offspring of the OMax system, that was used at various occasions during live performances with improvisers. During these collaborations, work sessions were associated to listening sessions and interviews to gather the evaluations of the musicians on the system in order to validate and refine the scientific and technological choices.

Kevin Sanlaville

Subject…

Hide …

Thèse soutenue par le Labex SMART
Title: Temporal adaptation of interaction
Supervisors: C. Pelachaud, F. Bevilacqua and G. Assayag
UPMC - SMART, Ircam-STMS et Telecom ParisTech - LTCI

The objective of this thesis is to develop an adaptive and proactive management of the interaction system. The goal is to push back the limits of standard and traditional approaches that are static and determined beforehand. The stakes are therefore to propose a real-time adaption of the representations of interaction, taking into account the different time scales of interactions. This means recognizing and adapting short-term phenomena (reactions, synchronization) and long-term phenomena by updating multi-scale and hierarchical representation of interaction (involving cognitive or mental state representations). The thesis will use the knowledge, models and systems that are for the moment developed in separate areas: conversational agent, musical improvisation and interaction ("autonomous creative agents").

Ken Déguernel (Download the thesis)

Subject…

Hide …

TITLE: Learning of musical structures in the context of improvisation

Current musical improvisation systems are able to generate unidimensional musical sequences by recombining their musical contents. However, considering several dimensions (melody, harmony…) and several temporal levels are difficult issues. In this thesis, we propose to combine probabilistic approaches with formal language theory in order to better assess the complexity of a musical discourse, both from a multidimensional and multi-level point of view in the context of improvisation where the amount of data is limited.
First, we present a system able to follow the contextual logic of an improvisation modelled by a factor oracle whilst enriching its musical discourse with multidimensional knowledge represented by interpolated probabilistic models. Then, this work is extended to create another system using a belief propagation algorithm representing the interaction between several musicians, or between several dimensions, in order to generate multidimensional improvisations. Finally, we propose a system able to improvise on a temporal scenario with multi-level information modelled with a hierarchical grammar. We also propose a learning method for the automatic analysis of hierarchical temporal structures.
Every system is evaluated by professional musicians and improvisers during listening sessions.

Nathan Libermann

Subject…

Hide …

Deep learning for musical structure analysis and generation

(Inria Nancy et Inria Rennes)

Despite numerous studies on automatic music transcription and composition, the temporal structure of music pieces at various time scales remains difficult to model. Automatic music improvisation systems such as OMax and ImproteK assume that the structure is either predetermined (chord chart) or completely free, which limits their use to specific musical styles. The concepts of semiotic structure and System & Contrast we recently introduced helped defining musical structure in a more general way. Yet, they do not easily translate into a computational model due to the large temporal horizon required and to the semantic gap with the observed musical signal or score. In the last few years, deep learning has emerged as the new state of the art in the field of natural language processing (NLP) and it has already demonstrated its potential for modeling short-term musical structure.

The goal of this PhD is to exploit and adapt deep recurrent neural networks (RNNs) for modeling medium- and long-term musical structure. This involves the following tasks in particular: - designing new RNN architectures for jointly modeling music at several time scales: tatum, beat, bar, structural block (e.g., chorus or verse), whole piece, - training them on smaller amounts of data than in the field of NLP, - evaluating their performance for musical structure estimation and automatic music improvisation.

This position is part of a funded project with Ircam, in which the successful candidate will have the opportunity to engage.