Keynotes – CBMI 2024

We are proud to present the following keynote speakers at CBMI 2024.

Being Multimodal: What Building Virtual Humans has Taught us about Multimodality

Abstract: Intelligent Virtual Agents (IVAs) are autonomous virtual humans that are meant to exhibit human-like traits when interacting with the world around them. When imbued with sufficient social skills, communicating face-to-face with them could feel just like communicating with real humans. The allure has been that such agents would revolutionize any domain of human-computer-interaction where rich social interaction with an automated system could take things to the next level, such as in tutoring systems, digital home assistants, personal trainers, online sales agents and even systems providing remote care for the elderly. However, replicating human face-to-face communication skills has proved a massive theoretical and technical challenge, not least because it involves the seamless and spontaneous coordination of multiple natural modalities, including spoken words, intonation, gesture, facial expressions, posture and gaze. What to many seemed to be superfluous body motion turned out to be a tightly woven fabric of multi-modal signals, evolved as an effective system of communication between humans since the very dawn of their social existence. We are, it turns out, multi-modal beings down to our core. In this talk I will start by taking us to the origins of the field of Embodied Conversational Agents (ECAs), a sub-field of Intelligent Virtual Agents, that deals specifically with providing agents with face-to-face communication skills. I will review our attempts to capture, understand and analyze the multi-modal nature of human communication, and how we have built and evaluated systems that engage in and support such communication. While I use communication as a particular study in multimodality, I will explore how some of the underlying principles may have wider relevance to working with multi-modal and multimedia content, and to the way we envision our data driven future.

Speaker bio: Dr. Hannes Högni Vilhjálmsson is a Professor of Computer Science at Reykjavik University where he leads the Socially Expressive Computing group at the Center for Analysis and Design of Intelligent Agents (CADIA), of which he was the director from 2013 to 2016. He has been doing research on the automatic generation of social and linguistic nonverbal behavior in autonomous agents and online avatars for nearly 30 years. His focus has been on making embodied communication in virtual environments both effective and intuitive, targeting primarily applications in training, education, healthcare and entertainment. Dr. Vilhjálmsson chaired the Reykjavik University’s Research Council from 2016 to 2019, and is a member of a number of academic steering and organizing committees, as well as industrial advisory and directorial boards. Prior to joining Reykjavik University in 2006, Dr. Vilhjálmsson was the technical director on the Tactical Language and Culture Training project at University of Southern California, which used social AI and advanced language technology to teach foreign languages and culturally appropriate behavior, earning the project DARPA’s Technical Achievement Award. Along with his academic career, Dr. Vilhjálmsson has co-founded several companies that take advantage of virtual experiences, including Alelo Inc, that builds serious games for immersive language learning, MindGames, that released the first BCI mind training games for the iPhone, and Envalys, that uses VR to assess the psychological impact of planned urban environments on prospective inhabitants before construction. He received his Ph.D. in Media Arts and Sciences from the MIT Media Lab in 2003.

What does it mean to ‘work as intended’?

While we can acquire and analyze more data than ever, this data is unstructured and messy, and measurement procedures may not have been optimal. Even more strongly, in many human-focused use cases, we may not be able to fully articulate what and where to measure, even though we have a good intuitive sense on what is an intended or unintended outcome.
To make this even more challenging, multiple examples exist where techniques were ‘working as intended’ according to traditional metrics and evaluation procedures, but where in actual systems, they turned out not to, even causing societally problematic outcomes.
In this keynote presentation, I will discuss how my interest in this topic was triggered in the music domain, and illustrate how my team currently works on questions of validation and validity. Here, we seek to gain confidence in computational measurement procedures, drawing inspiration from psychometric validity, software testing, and metascientific and open science practices. In this, I will address both tensions and connection opportunities between design, science and engineering methodologies, and argue why I feel these need to be addressed when working on societally impactful applications. At the same time, it is important to realize that current incentives and trends in academia are not trivially rewarding such endeavors. With this, I would like to invite the audience to reflect on core academic values and questions of integrity and as such, on whether we ourselves tend to work as intended.

Speaker bio: Dr. Cynthia C. S. Liem MMus is an Associate Professor in the Multimedia Computing Group of Delft University of Technology, and pianist of the Magma Duo. Her research interests are in trustworthy and responsible AI; here, she especially focuses on techniques that make people discover new interests and content which would not trivially be retrieved, and questions of validation and validity in data-driven decision-making. After starting in music information retrieval, today, her research considers broader public-interest domains with high societal impact.

	Dr. Hannes Högni Vilhjálmsson

	Dr. Cynthia C. S. Liem