From Undercomplete to Sparse Overcomplete Autoencoders to Improve LF-MMI Speech Recognition

Type of publication:	Conference paper
Citation:	Kabil_INTERSPEECH_2022
Publication status:	Accepted
Booktitle:	Proceedings of Interspeech Conference
Year:	2022
Abstract:	Starting from a strong Lattice-Free Maximum Mutual Information (LF-MMI) baseline system, we explore different autoencoder configurations to enhance Mel-Frequency Cepstral Coefficients (MFCC) features. Autoencoders are expected to generate new MFCC features that can be used in our LF-MMI based baseline system (with or without retraining) towards speech recognition improvements. Starting from shallow undercomplete autoencoders, and their known equivalence with Principal Component Analysis (PCA), we go to deeper or sparser architectures. In the spirit of kernel-based learning methods, we explore alternatives where the autoencoder first goes overcomplete (i.e., expand the representation space) in a nonlinear way, and then we restrict the autoencoder by means of a sequent bottleneck layer. Finally, as a third solution, we use sparse overcomplete autoencoders where a sparsity constraint is imposed on the higher-dimensional encoding layer. Experimental results are provided on the Augmented Multiparty Interaction (AMI) dataset, where we show that all aforementioned architectures improve speech recognition performance, although with a clear advantage on sparse overcomplete autoencoders for both close-talk and far-field speech sets.
Keywords:	bottleneck, chain models, PCA, sparse overcomplete autoencoder, speech recognition
Projects	SHISSM
Authors	Kabil, Selen Hande Bourlard, Hervé
Added by:	[UNK]
Total mark:	0
Attachments
Kabil_INTERSPEECH_2022.pdf
Notes

processing time: 0.0003 seconds.