Enhanced Acoustic Howling Suppression via Hybrid Kalman Filter and Deep Learning Models
Hao Zhang1, Yixuan Zhang2, Meng Yu1, Dong Yu1
1. Tencent AI Lab, Bellevue, WA, USA
2. The Ohio State University, Columbus, OH, USA
Abstract: This paper presents a comprehensive study addressing the challenging problem of acoustic howling suppression (AHS) through the fusion of Kalman filter and deep learning techniques. We introduce two integration approaches: HybridAHS, which concatenates Kalman and neural networks (NN), and NeuralKalmanAHS, where NN modules are embedded inside the Kalman filter for signal and parameter estimation. In HybridAHS, we explore two implementation methods. One is trained offline using pre-processed signals with a light training burden, while the other employs a recursive training strategy with training signals generated adaptively. The offline model serves as an initialization for recursively training the other model. With NeuralKalmanAHS, we harness the power of NN modules to refine the reference signal and improve covariance matrices estimation in the Kalman filter, resulting in enhanced feedback suppression. Our methods capitalize on the strengths of traditional and deep learning-based AHS techniques. We have explored different variants of combining Kalman filter and NN and systematically compared their howling suppression performance, providing users with versatile solutions for addressing AHS. Furthermore, by employing the proposed recursive training, we effectively mitigate the mismatch issues that plagued previous NN-based AHS methods. Extensive experimental results show the superiority of our approach over baseline techniques.
This page provides sound demos for the titled paper. The titled paper can be accessed through [Link will be provided later.].
Fig. Spectrograms of an utterance tested under G = 2: (a) target signal, (b) Kalman filter, (c) Kalman filter with covariance matrices estimation, (d) Kalman filter with reference signal estimation, (e) Proposed NeuralKalmanAHS.
Although the output from the Kalman filter has been included, listening to the entire audio is not recommended.
Waveforms
G = 2
Target signal
Kalman filter
NeuralKalmanAHS with covariance matrices estimation
NeuralKalmanAHS with reference signal estimation
NeuralKalmanAHS
Fig. Spectrograms of a test utterance at two different G levels: (a) target signal, (b) no AHS, (c) Kalman filter, (d) DeepMFC, (e) DeepAHS, (f)NNAFC (g) HybridAHS v1, (h) HybridAHS v2, (i) HybridAHS v2 (cRM2), and (j) NeuralKalmanAHS.
Due to potential risks to the auditory system, we have omitted the 'no AHS' audio. Although we have included the outputs from the Kalman filter, listening to the entire audio is not recommended.
Waveforms
Moderate Howling (G = 1.5)
Severe Howling (G = 3)
Target signal
no AHS
Kalman filter
Deep MFC
Deep AHS
NNAFC
HybridAHS v1
HybridAHS v2
HybridAHS v2 (cRM2)
NeuralKalmanAHS
Fig. Spectrograms of an utterance tested using real-world recordings: (a) target signal, (b) no AHS, (c) Kalman, (d) HybridAHS_v1, (e) HybridAHS_v2, (f) HybridAHS_v2 (cRM2), and (g) NeuralKalmanAHS.