Experiment I: Autoregressive wave generation conditioned on mel-spectrogram
We obtain high-fidelity synthesized speech by training an autoregressive WaveNet with the single Gaussian output distribution.
Ground-truth
Single Gaussian
Mixture of Gaussian (k = 10)
Mixture of Logistic (k = 10)
Softmax (channel = 2048)
1: Others are students or workers involved in some way with agriculture.
2: It is the purpose of antitrust law to look to the future.
3: May I reserve a deck chair, please?
4: But bullies are like termites.
5: Of course, once I became a full time musician, I discovered that many of those hard working, dedicated professionals also happened to be miscreant winos.
Experiment II: Parallel wave generation conditioned on mel-spectrogram
We propose a parallel wave generation method based on Gaussian inverse autoregressive flow (IAF). We distill a parallel student-net from an autoregressive teacher-net. Our method generates all samples of an audio waveform in parallel.
Student-Net-1 (Reverse KLreg + STFT-loss)
Student-Net-1 (Forward KLreg + STFT-loss)
Student-Net-2 (Reverse KLreg + STFT-loss)
1: Others are students or workers involved in some way with agriculture.
2: It is the purpose of antitrust law to look to the future.
3: May I reserve a deck chair, please?
4: But bullies are like termites.
5: Of course, once I became a full time musician, I discovered that many of those hard working, dedicated professionals also happened to be miscreant winos.
Experiment III: End-to-End Text-to-Wave Model
We propose the first text-to-wave model for speech synthesis, which is fully convolutional and enables fast end-to-end training from scratch. We also successfully distill a parallel waveform synthesizer conditioned on the hidden representation in this end-to-end model.
Text-to-Wave Teacher
Text-to-Wave Studnet
1: Please call Stella.
2: Ask her to bring these things with her from the store.
3: Some have accepted it as a miracle without physical explanation.
4: The rainbow is a division of white light into many beautiful colors.
5: Throughout the centuries people have explained the rainbow in various ways. html_padding_html_padding_html_padding_html_padding_html_padd
Extension: ClariNet for Mandarin Chinese
We also extend ClariNet with linguisitc conditioner for Mandarin Chinese.