Sound demos for "ClariNet: Parallel Wave Generation in End-to-End Text-to-Speech"

ICLR 2019: paper link

Authors: Wei Ping, Kainan Peng, Jitong Chen. (equal contribution.)

A photo of clarinet.

Experiment I: Autoregressive wave generation conditioned on mel-spectrogram

We obtain high-fidelity synthesized speech by training an autoregressive WaveNet with the single Gaussian output distribution.

Ground-truthSingle GaussianMixture of Gaussian (k = 10)Mixture of Logistic (k = 10)Softmax (channel = 2048)
1: Others are students or workers involved in some way with agriculture.
2: It is the purpose of antitrust law to look to the future.
3: May I reserve a deck chair, please?
4: But bullies are like termites.
5: Of course, once I became a full time musician, I discovered that many of those hard working, dedicated professionals also happened to be miscreant winos.


Experiment II: Parallel wave generation conditioned on mel-spectrogram

We propose a parallel wave generation method based on Gaussian inverse autoregressive flow (IAF). We distill a parallel student-net from an autoregressive teacher-net. Our method generates all samples of an audio waveform in parallel.

Student-Net-1 (Reverse KLreg + STFT-loss)Student-Net-1 (Forward KLreg + STFT-loss)Student-Net-2 (Reverse KLreg + STFT-loss)
1: Others are students or workers involved in some way with agriculture.
2: It is the purpose of antitrust law to look to the future.
3: May I reserve a deck chair, please?
4: But bullies are like termites.
5: Of course, once I became a full time musician, I discovered that many of those hard working, dedicated professionals also happened to be miscreant winos.


Experiment III: End-to-End Text-to-Wave Model

We propose the first text-to-wave model for speech synthesis, which is fully convolutional and enables fast end-to-end training from scratch. We also successfully distill a parallel waveform synthesizer conditioned on the hidden representation in this end-to-end model.

Text-to-Wave TeacherText-to-Wave Studnet
1: Please call Stella.
2: Ask her to bring these things with her from the store.
3: Some have accepted it as a miracle without physical explanation.
4: The rainbow is a division of white light into many beautiful colors.
5: Throughout the centuries people have explained the rainbow in various ways. html_padding_html_padding_html_padding_html_padding_html_padd


Extension: ClariNet for Mandarin Chinese

We also extend ClariNet with linguisitc conditioner for Mandarin Chinese.

Gaussian WaveNet TeacherGaussian IAF Student