This is a ~50M parameter diffusion model trained on LibriSpeech train-clean-360. It can roughly be reproduced by running the below commands: the first for 265K iterations, the second for an additional 164K iterations. python3 -u train_diffusion.py \ --predictor unet \ --base-channels 64 \ --grad-checkpoint \ --batch 4 \ --ema-rate 0.9999 \ $DATA/LibriSpeech/train-clean-360 python3 -u train_diffusion.py \ --predictor unet \ --base-channels 64 \ --grad-checkpoint \ --batch 4 \ --lr 3e-5 \ --ema-rate 0.9999 \ $DATA/LibriSpeech/train-clean-360 The final quartile losses for the model were: q0=0.02067 q1=0.00118 q2=0.00008 q3=0.00001 For evaluations, I sampled with this command: python3 sample_diffusion.py \ --checkpoint-path model_diffusion_ema.pt \ --sample-steps 50 \ --schedule 'lambda x: x**2' \ --constrain \ --num-samples 10000 \ --batch-size 16 \ --sample-path samples-50step-sqschedule Class score: 69.0 Frechet score: 1834 I also evaluated the model before the second command. I expected it to be worse but instead found that the Frechet score is much better. Perhaps the model was overfitting by the second checkpoint. Class score: 51.5 Frechet score: 855 I have put the early-stopped models in early_stopped/