Keynote Speakers
The IEEE ICME 2024 Committee are pleased to announce the following speakers have confirmed their participation at the Conference. More speakers will be confirmed soon.
Professor Chang Wen Chen
Hong Kong Polytechnic University
Contemporary Visual Computing: From Structured SGG to Semantic Communications
This talk shall focus on recent advances in visual computing with critical implications for the next generation 6G semantic communications. Semantic communication was initially proposed by Weaver and Shannon 70+ years ago when they outlined the classical definition of three levels of communications: the technical problem, the semantic problem, and the effectiveness problem. Until 5G, communication researchers and practitioners have been working on the first problems at the technical level. For upcoming 6G, semantic communication becomes necessary to handle the overwhelming volume of visual data among all IP traffics. We believe a paradigm-shifting framework needs to be designed to transport such volumetric visual data under the 6G mobile communication architecture. We show that recent technical advances in contemporary visual computing bear great potential for 6G semantic communication. Among the volumetric visual data, a significant portion of them has actually been acquired for machine intelligence purposes. Therefore, structured extraction and representation of the semantics from these visual data are desired to facilitate the 6G semantic communication. For contemporary visual computing, the well-structured scene graph generation (SGG) approaches have been demonstrated capable of representing compactly the logical relationship among the subjects and objects detected from the visual data. We shall show that the unique capability of structured SGG can be applied to 6G semantic communication towards future advances in integrating visual computing with 6G.
Speaker’s Biography
Chang Wen Chen is currently Chair Professor of Visual Computing at The Hong Kong Polytechnic University. Before his current position, he served as Dean of the School of Science and Engineering at The Chinese University of Hong Kong, Shenzhen from 2017 to 2020, and concurrently as Deputy Director at Peng Cheng Laboratory from 2018 to 2021. Previously, he has been an Empire Innovation Professor at the State University of New York at Buffalo (SUNY) from 2008 to 2021 and the Allan Henry Endowed Chair Professor at the Florida Institute of Technology from 2003 to 2007. He received his BS degree from the University of Science and Technology of China in 1983, his MS degree from the University of Southern California in 1986, and his PhD degree from the University of Illinois at Urbana Champaign (UIUC) in 1992.
He has served as an Editor-in-Chief for IEEE Trans. Multimedia (2014-2016) and IEEE Trans. Circuits and Systems for Video Technology (2006-2009). He has received many professional achievement awards, including ten (10) Best Paper Awards or Best Student Paper Awards in premier publication venues, the prestigious Alexander von Humboldt Award in 2010, the SUNY Chancellor’s Award for Excellence in Scholarship and Creative Activities in 2016, and UIUC ECE Distinguished Alumni Award in 2019. He is an IEEE Fellow (2005), a SPIE Fellow (2007), and a Member of Academia Europaea (2021).
Professor Jay Kuo
University of Southern California
Toward Interpretable and Sustainable AI via Green Learning
Rapid advances in artificial intelligence (AI) have been attributed to the wide applications of deep learning (DL) networks. There are, however, long-standing concerns with the DL methodology – mathematical opaqueness, vulnerability to adversarial attacks, and high carbon footprints. I have researched alternative AI technologies since 2015. The new learning paradigm is named “green learning (GL)” because of its low carbon footprints attributed to smaller model sizes, lower computational complexity, and mathematical transparency.
GL models differ from DL models entirely as they have no neurons or networks. They are modularized and trained in a feedforward manner without backpropagation. The whole comprises three main modules: 1) representation learning, 2) feature learning, and 3) decision learning. All intermediate results are explicit and explainable. GL models offer energy-efficient AI solutions in cloud centers and/or mobile/edge devices. They have been successfully applied to various applications. I will use several examples to demonstrate their effectiveness and efficiency.
Speaker’s Biography
Dr. C.-C. Jay Kuo received his Ph.D. from the Massachusetts Institute of Technology in 1987. He is now with the University of Southern California (USC) as the Ming Hsieh Chair Professor, a Distinguished Professor of Electrical and Computer Engineering and Computer Science, and the Director of the Media Communications Laboratory. His research interests are in visual computing and communication. He is a Fellow of AAAS, ACM, IEEE, NAI, and SPIE and an Academician of Academia Sinica. Dr. Kuo has received a few awards for his research contributions, including the 2010 Electronic Imaging Scientist of the Year Award, the 2010-11 Fulbright-Nokia Distinguished Chair in Information and Communications Technologies, the 2019 IEEE Computer Society Edward J. McCluskey Technical Achievement Award, the 2019 IEEE Signal Processing Society Claude Shannon-Harry Nyquist Technical Achievement Award, the 72nd annual Technology and Engineering Emmy Award (2020), and the 2021 IEEE Circuits and Systems Society Charles A. Desoer Technical Achievement Award. Dr. Kuo was the Editor-in-Chief for the IEEE Transactions on Information Forensics and Security (2012-2014) and the Journal of Visual Communication and Image Representation (1997-2011). He is currently the Editor-in-Chief for the APSIPA Trans. on Signal and Information Processing (2022-2023). He has guided 175 students to their Ph.D. degrees and supervised 31 postdoctoral research fellows.
Professor Lu Jiang
ByteDance
Unleashing the Power of Transformers for Visual Generation: A Personal Journey
In this talk, I will discuss a personal research journey through published works in the field of using transformers for visual generation (image and video). Despite the initial enthusiasm brought by DALL-E, this approach quickly lost popularity but has recently regained momentum due to collaborative efforts within our research community, such as DiT, Sora, and VideoPoet. I will share a series of works our group has carried out to advance progress in this area, including 1) non-autoregressive transformer approaches such as MaskGIT/MUSE for image generation, and MAGVIT V1 and V2 for video generation, 2) diffusion-based transformer WALT for video generation, and 3) autoregressive transformer or LLM-based approaches for video generation like VideoPoet. In the end, I will provide a perspective on my extrapolation of the next generation of a true multimodal foundation model that is capable of natively understanding, generating, and reasoning with our visual world.
Speaker’s Biography
Lu Jiang is currently a research lead at ByteDance USA, spearheading video generation efforts within the foundation model named SEED. He also holds a position as an adjunct faculty member at Carnegie Mellon University. Prior to this, he served as a staff research scientist and manager at Google. His research has been integral in multiple Google products, such as YouTube, Cloud, AutoML, Ads, Waymo, and Translate, impacting the daily lives of billions of users worldwide. His research interests lie in the interdisciplinary field of Multimedia and Machine Learning, with a focus on video creation and multimodal foundation models. His work has been nominated for best paper at top conferences in natural language processing (ACL) and computer vision (CVPR). Lu is an active member of the research community, serving as an AI panelist for America’s Seed Fund (NSF SBIR). He regularly acts as an area chair for conferences like CV