The 6th International Workshop on Affective Social Multimedia Computing (ASMMC 2021)

The 6th International Workshop on
Affective Social Multimedia Computing

October 18-22, 2021, Montreal, Canada

Call for Papers

Affective social multimedia computing is an emergent research topic for both affective computing and multimedia research communities. Social multimedia is fundamentally changing how we communicate, interact, and collaborate with other people in our daily lives. Comparing with well-organized broadcast news and professionally made videos such as commercials, TV shows, and movies, social multimedia media computing imposes great challenges to research communities. Social multimedia contains much affective information. Effective extraction of affective information from social multimedia can greatly help social multimedia computing (e.g., processing, index, retrieval, and understanding). Although much progress have been made in traditional multimedia research on multimedia content analysis, indexing, and retrieval based on subjective concepts such as emotion, aesthetics, and preference, affective social multimedia computing is a new research area. The affective social multimedia computing aims to proceed affective information from social multi-media. For massive and heterogeneous social media data, the research requires multidisciplinary understanding of content and perceptual cues from social multimedia. From the multimedia perspective, the research relies on the theoretical and technological findings in affective computing, machine learning, pattern recognition, signal/multimedia processing, computer vision, speech processing, behavior and social psychology. Affective analysis of social multimedia and interaction is attracting growing attention from industry and businesses that provide social networking sites, content-sharing services, distribute and host the media, social interaction with artificial agents. This workshop focuses on the analysis of affective signals in interaction (multimodal analyses enabling artificial agents in Human-Machine Interaction, social Interaction with artificial agents) and social multimedia (e.g., twitter, wechat, weibo, youtube, facebook, etc).

The 1st, 2nd, 3rd, 4th, 5th ASMMC workshop has been successfully held in Xi'an, China on September 21, 2015, Seattle, USA on July 15, 2016, Stockholm, Sweden on August 25, 2017, and Seoul, Korea on October 26, 2018, July 2, 2019 respectively. We take the 6th ASMMC to ACM ICMI 2021 come back again to Affective Computing & Intelligent Interaction for investigating affective computing technology to become available and accessible to education, health, transport, cities, home and entertainments.

It seeks contributions on various aspects of affective computing in interaction and social multimedia and interaction on related theory, methodology, algorithms, techniques, and applications

  • Affective human-machine interaction or human-human interaction
  • Affective/Emotional content analysis of images, videos, speech, music, metadata (text, symbols, etc.)
  • Affective indexing, ranking, and retrieval on big social media data
  • Affective computing in social multimedia by multimodal integration (face expression, gesture, posture, speech, text/language)
  • Emotional implicit tagging and interactive systems
  • User interests and behaviour modeling in social multimedia
  • Video and image summarization based on affect
  • Affective analysis of social media and harvesting the affective response of crowd
  • Affective generation in social multimedia, expressive text-to-speech and expressive language translation
  • Zero/One/Few-shot learning for emotion recognition
  • Multimodal Analyses enabling Artificial Agents in Human-Machine Interaction
  • Social Interaction with Artificial Agents
  • Applications of affective social multimedia computing

Workshop General Chair

  • Youjun XIONG
    UBTech Robotics Corporate

Workshop Chairs

  • Dongyan HUANG
    Senior Member, IEEE, Principle Scientist, Text-to-Speech Synthesis Lab
    Audio, Speech and Language Technology, UBTech Robotics Corp

  • Björn SCHULLER
    Fellow of the IEEE, Senior Member ACM
    Full Professor & Head of the ZD.B Chair of Embedded Intelligence for Health Care & Wellbeing, University of Augsburg, Germany
    Reader (Associate Professor) in Machine Learning, Department of Computing, Imperial College London, London/U.K
    Chief Executive Officer (CEO) and Co-Founder, audEERING GmbH, Gilching/Germany

  • Jianhua TAO
    Full Professor & Deputy Director, National Laboratory of Pattern Recognition (NLPR)
    Institute of Automation, Chinese Academy of Sciences

  • Lei XIE
    Senior Member, IEEE
    Dr./ Professor
    Shannxi Provincial Key Lab of Speech and Image Information Processing (SAIIP)
    School of Computer Science
    Northwestern Polytechnical University, Xi'an, China

  • Jie YANG
    Fellow, IEEE
    Program director
    Division of Information and Intelligent Systems
    National Science Foundation (NSF) of USA

Organizing Committee

  • Shih-Fu CHANG
    Columbia University, USA
  • Stephen COX
    University of East Anglia, UK
  • Minghui DONG
    Institute for Infocomm Research, Singapore
  • Wolfgang HUERST
    Utrecht University, Netherlands
  • Qiang JI
    Rensselaer Polytechnic Institute, USA
  • Jia JIA
    Tsinghua university, China
  • Qin JIN
    Renmin University of China, China
  • Dongmei JIANG
    Northwestern Polytechnical University, Xi'an, China
  • Bo LI
    Google, USA
  • Haizhou LI
    Institute for Infocomm Research, Singapore
  • Ming LI
    Duke-Kunshan University, China
  • Jiebo LUO
    University of Rochester, USA
  • Hichem SAHLI
    Vrije Universiteit Brussel, Belgium
  • Vidhyasaharan SETHU
    The University of New South Wales, Australia
    Karlsruhe Institute of Technology, USA
  • Yan TONG
    University of South Carolina, USA
  • Jia-Ching WANG
    National Central University
  • Chung-Hsien WU
    National Cheng Kung University, Taiwan
  • Zhiyong Wu
    Tsinghua University, China
  • Zhizheng Wu
    Edinburgh University, UK
  • Changsheng XU
    Chinese Academy of Sciences, China
  • Shuicheng YAN
    National University of Singapore, Singapore
  • Yanning ZHANG
    Northwestern Polytechnical University, China
  • Peng ZHANG
    Northwestern Polytechnical University, China
  • Yuexian ZOU
    Peking University, China
  • Zhongfei ZHANG
    Binghamton University, USA
  • Xuan ZHU
    Samsung R&D Institute of China, China


Qin Jin is a professor in School of Information at Renmin University of China (RUC), where she leads the AI·M³ lab. She received her Ph.D. degree in 2007 at Carnegie Mellon University. Before joining RUC in 2013, she was a research faculty (2007-2012) and a research scientist (2012) at Carnegie Mellon University and IBM China Research Lab respectively. Her research interests are in intelligent multimedia computing and human computer interaction. Her team’s recent works on video understanding and multimodal affective analysis have won various awards in international challenge evaluations, including CVPR ActivityNet Dense Video Captioning challenge, NIST TrecVID VTT evaluation, ACM Multimedia Audio-Visual Emotion Challenge etc.

Talk Title:Multimodal Emotion Recognition

Understanding human emotions is one of the fundamental steps in establishing natural human-computer interaction systems that possess the emotion perception ability. The behavior signals of human emotion expression are multimodal, including voice, facial expression, body language, bio-signals etc. and interactive scenarios such as conversational dialogues are the natural scenes of emotion stimulation and expression. Our research focuses on the integration of multimodal information for robust emotion perception in natural dialogues, which involves some of the major challenges, including modeling the contextual information in a dialogue, handling missing modality issues in inference, learning robust emotion feature representation across cultures etc. This talk will present our recent works addressing these challenges on multimodal emotion recognition.

Erik Cambria is the Founder of SenticNet, a Singapore-based company offering B2B sentiment analysis services, and an Associate Professor at NTU, where he also holds the appointment of Provost Chair in Computer Science and Engineering. Prior to joining NTU, he worked at Microsoft Research Asia (Beijing) and HP Labs India (Bangalore) and earned his PhD through a joint programme between the University of Stirling and MIT Media Lab. His research focuses on the ensemble application of symbolic and subsymbolic AI to natural language processing tasks such as sentiment analysis, dialogue systems, and financial forecasting. Erik is recipient of several awards, e.g., the 2019 IEEE Outstanding Early Career Award, he was listed among the 2018 AI's 10 to Watch, and was featured in Forbes as one of the 5 People Building Our AI Future. He is Associate Editor of many top AI journals, e.g., INFFUS, IEEE CIM, and KBS, Special Content Editor of FGCS, Department Editor of IEEE Intelligent Systems, and is involved in many international conferences as program chair and invited speaker.

Talk Title:Neurosymbolic AI for Affective Computing and Sentiment Analysis

Abstract: With the recent developments of deep learning, AI research has gained new vigor and prominence. However, machine learning still faces three big challenges: (1) it requires a lot of training data and is domain-dependent; (2) different types of training or parameter tweaking leads to inconsistent results; (3) the use of black-box algorithms makes the reasoning process uninterpretable. At SenticNet, we address such issues in the context of NLP via sentic computing, a multidisciplinary approach that aims to bridge the gap between statistical NLP and the many other disciplines necessary for understanding human language such as linguistics, commonsense reasoning, and affective computing. Sentic computing is both top-down and bottom-up: top-down because it leverages symbolic models such as semantic networks and conceptual dependency representations to encode meaning; bottom-up because it uses subsymbolic methods such as deep neural networks and multiple kernel learning to infer syntactic patterns from data. 

Important Dates

Events Dates
Submission of Manuscripts August 31th, 2021
Notification of Acceptance/Rejection September 9th, 2021
Submission for Camera-ready Papers and Presenting Author's Registration September 15th, 2021
Date of the Workshop October 18-22, 2021

Technical Program

Date: October 18th, 2021 (Montreal Time/UTC-4)

    PART 1: Opening
    Session Chair: Dong-Yan Huang
  • 08:00 - 08:10
    Opening statement
    Dr. Youjun Xiong

  • PART 2: Keynote Ⅰ
    Session Chair: Björn Schuller
  • 08:10 - 09:10
    Title:Neurosymbolic AI for Affective Computing and Sentiment Analysis
    Erik Cambria

  • PART 3: Sentiment, Micro-expression and Paralinguistic analysis
    Session Chair: Lei Xie
  • 09:10 - 09:30
    BERT Based Cross-Task Sentiment Analysis with Adversarial Learning
    Zhiwei He, Xiangmin Xu, Xiaofen Xing and Yirong Chen
  • 09:30 - 09:50
    Facial Micro-Expression Recognition Based on Multi-Scale Temporal and Spatial Features
    Hao Zhang, Bin Liu, Jianhua Tao and Zhao Lv
  • 09:50 - 10:10
    Aspect based sentiment analysis is a branch of sentiment analysis
    Yingtao Huo, Dongmei Jiang and Hichem Sahli
  • 10:10 - 10:30
    Call For Help Detection In Emergent Situations Using Keyword Spotting And Paralinguistic Analysis
    Huangrui Chu, Yechen Wang, Ran Ju, Yan Jia, Haoxu Wang, Ming Li and Qi Deng

  • PART 4: Keynote Ⅱ
    Session Chair: Dong-Yan Huang
  • 10:30 - 11:30
    Title: Multimodal Emotion Recognition
    Qin Jin

  • PART 5: Multimodal Emotion Recognition
    Session Chair: Jianhua Tao
  • 11:30 - 11:50
    FER by Modeling the Conditional Independence between the Spatial Cues and the Spatial Attention Distributions
    Wan Ding, Dongyan Huang, Jingjun Liang, Jinlong Jiao and Zhiping Zhao
  • 11:50 - 12:10
    Efficient Gradient-based Neural Architecture Search for end-to-end ASR
    Xian Shi, Pan Zhou, Wei Chen and Lei Xie
  • 12:10 - 12:30
    Temporal Attentive Adversarial Domain Adaption for Cross Cultural Affect Recognition
    Haifeng Chen, Yifan Deng and Dongmei Jiang
  • 12:30 - 12:50
    A Multimodal Dynamic Neural Network for Call for Help Recognition in Elevators
    Ran Ju, Huangrui Chu, Yechen Wang, Qi Deng, Ming Cheng and Ming Li
  • 12:50 - 13:10
    A Web-Based Longitudinal Mental Health Monitoring System
    Zhiwei Chen, Weizhao Yang, Jinrong Li, Jiale Wang, Shuai Li, Ziwen Wang and Lei Xie

  • PART 6: Multimodal Emotion Synthesis
    Session Chair: Jie Yang
  • 13:20 - 13:40
    Semantic and Acoustic-Prosodic Entrainment of Dialogues in Service Scenarios
    Liu Yuning, Jianwu Dang, Aijun Li and Di Zhou
  • 13:40 - 14:00
    Improving Model Stability and Training Efficiency in Fast Speed High Quality Expressive Voice Conversion System
    Zhiyuan zhao, jingjun liang, zehong zheng, linhuang yan, zhiyong yang, wan ding, and dongyan huang
  • 14:00 - 14:20
    TeNC: Low Bit-Rate Speech Coding with VQ-VAE and GAN
    Yi Chen, Shan Yang, Na Hu, Lei Xie and Dan Su
  • 14:20 - 14:40
    Noise Robust Singing Voice Synthesis Using Gaussian Mixture Variational Autoencoder
    Heyang Xue, Xiao Zhang, Jie Wu, Jian Luan, Yujun Wang and Lei Xie

  • PART 7: Panel discussion and closing remarks
  • 14:40 - 15:00
    Dong-Yan Huang

Submission Instructions

All the workshop accepted papers will be published in an adjunct proceedings to the 23nd ACM International Conference on Multimodal Interaction (ICMI 2021::23rd ACM International Conference on Multimodal Interaction). We invite the submissions in the following categories:

  • Long papers/Short papers relevant to the themes of the workshop
  • 1.5-pages* abstracts relevant to the themes of the workshop.
Papers and abstracts should be submitted through


Please note that ICMI 2021 will use a new ACM Publication System (TAPS) process. This means submitted papers should follow the instructions and use templates given in the following link:
All authors should submit manuscripts for review in a single-column format instead of the previous two-column format. We invite submissions of long papers, short papers, and extended abstracts formatted according to the new ICMI guidelines. It means:

  • Long papers (8 pages in two-column -> 13~14 pages in one column + references)
  • Short papers (4 pages in two-column) -> 7 pages in one column+ references)
  • Extended abstracts (1 page in two-column) -> 1.5 pages in one column, Abstracts can contain Figures and/or Tables.


Please refer to the instructions on the

  • (Word template) Write your paper using the Submission Template ( Follow the embedded instructions to apply the paragraph styles to your various text elements. The text is in single-column format at this stage and no additional formatting is required at this point.
  • (Latex template) Please use the latest version of the Primary Article Template - LaTeX (1.78; published May 25, 2021) to create your paper submission. Use the "manuscript" call to create a single column format.


Proceedings will be published as adjunct proceedings to the ACM International Conference on Multimodal Interaction 2021.

Paper submission and acceptance deadlines

Workshop papers or abstract due: August 16th, 2021 (extended to August 31th, 2021)
Notification of acceptance: September 9th, 2021
Camera-ready paper: September 15th, 2021