ABSTRACT
Laryngectomy patients, and others suffering larynx and voice box deficiencies generally cannot speak anything more than hoarse whispers without using voice prostheses or techniques such as oesophageal speech, trachea-oesophageal puncture (TEP) and electrolarynx. However each of these have particular disadvantages that range from clumsy usage to infection risk, and most importantly all suffer from a distinctly robotic-sounding output. This has recently prompted new work on nonsurgical and non-invasive alternatives for such patients. An engineering approach for reconstruction of natural sounding speech from the whisper- like speech produced by patients with vocal tract lesions affecting the glottis, aims to meet the long term goal of speech therapists relating to the rehabilitation of normal sounding speech without recourse to surgery.
This paper presents a solution for the conversion of whispers to normal sounding fully-phonated speech through the use of a modified CELP codec. We present a novel method for spectral enhancement and formant smoothing during the reconstruction process, using a probability mass-density function to identify reliable formant trajectories in whispers, and apply spectral modifications accordingly. The method relies upon the observation that, whilst the pitch generation mechanism of patients with larynx damage is typically unusable, the remaining components of the speech production apparatus maybe largely unaffected. The approach outlined here allows patients to regain their ability to speak with a more natural sounding voice than through alternative methods, by whispering into an external prosthesis which then recreates and outputs reconstructed speech.
Keywords: Bionic voice, CELP codec, Laryngectomy, Rehabilitation, Speech processing.