Advancing Acoustic Howling Suppression Through Recursive Training of Neural Networks

Hao Zhang1*, Yixuan Zhang2*, Meng Yu1, Dong Yu1

1. Tencent AI Lab, Bellevue, WA, USA

2. The Ohio State University, Columbus, OH, USA

*Equal contribution

In this paper, we introduce a novel training framework designed to comprehensively address the acoustic howling issue by examining its fundamental formation process. This framework integrates a neural network (NN) module into the closed-loop system during training with signals generated recursively on the fly to closely mimic the streaming process of acoustic howling suppression (AHS).The proposed recursive training strategy bridges the gap between training and real-world inference scenarios, marking a departure from previous NN-based methods that typically approach AHS as either noise suppression or acoustic echo cancellation. Within this framework, we explore two methodologies: one exclusively relying on NN and the other combining NN with the traditional Kalman filter. Additionally, we propose strategies, including howling detection and initialization using pre-trained offline models, to bolster trainability and expedite the training process. Experimental results validate that this framework offers a substantial improvement over previous methodologies for acoustic howling suppression.

This page provides sound demos for the titled paper. The titled paper can be accessed through https://arxiv.org/abs/2309.16048/.

comparison
Fig. Spectrograms of: (a) target signal, (b) no AHS, (c) Kalman filter, (d) DeepMFC, (e) HybridAHS, (f) Proposed NN, (g) Proposed Hybrid (RM), and (h) Proposed Hybrid.




Due to potential risks to the auditory system, we have omitted the 'no AHS' audio. Although we have included the outputs from the Kalman filter, listening to the entire audio is not recommended.

Waveforms
Moderate Howling (G = 1.5) Severe Howling (G = 3)
Target signal
no AHS
Kalman filter
Deep MFC
Hybrid AHS
Proposed NN
Proposed Hybrid (RM)
Proposed Hybrid




Evaluations in real environments.

real_env
Fig. Processed recordings in real environments. Spectrograms of: (a) target signal, (b) no AHS, (c) Kalman filter, (d) HybridAHS, and (e) Proposed Hybrid.



Waveforms
Moderate Howling (G = 1.5)
Target signal
no AHS
Kalman filter
Hybrid AHS
Proposed Hybrid