Have you ever seen birds flying across the sky in shifting, mesmerizing patterns? Or ants using their own bodies to form a ...
Abstract: The increasing ability of deep learning models to produce realistic-sounding synthetic speech poses serious problems for privacy, public trust, and digital security. To counter this danger, ...
Abstract: With only video-level event labels, this paper targets at the task of weakly-supervised audio-visual event perception (WS-AVEP), which aims to temporally localize and categorize events that ...