hyperconformer_8M on librispeech train-clean-100 does not perform as well as expected #2955

Mattias421 · 2025-07-24T13:29:20Z

Mattias421
Jul 24, 2025

Hi all,

I'm trying to train hyperconformer_8M as a baseline for my experiments, training only on train-clean-100. The architecture certainly takes less memory and is faster to train, but the WER is higher than expected. I followed the train.py script, using hparams/hyperconformer_8M.yaml.

Table 2 from the paper suggests the result should be 6.76% WER on test-other, but I get 16.74% WER.

Does anyone know what might be causing this?

The only things I can think of that can cause this discrepancy are different decoding or LM scoring strategies, or the data got corrupted on my server.

The full metrics for the final checkpoint were

train_loss: 25.43, valid loss 31.88, valid ACC 87.5%, valid WER 12.24, test clean loss 17.37, test clean ACC 88.2%, test clean WER 7.06, test other loss 15.59, test other ACC 79.1%, test other WER 16.74

I trained on an RTXA4500 20GB.

zzm196 · 2026-03-04T13:47:33Z

zzm196
Mar 4, 2026

Hello, have you found the problem and the solution?I also used hyperconformer_8M to reproduce Table 2 in the paper.I further modified the attention_type to get conformer_8M, and its WER is 15.57.
I trained on anRTX 3090 24GB GPU.

4 replies

Mattias421 Mar 4, 2026
Author

I'm afraid not! Good luck with your journey

TParcollet Mar 4, 2026
Maintainer

Make sure that your batch size is large enough. Play with grad accum and nb of GPU. The batch size must be large.

zzm196 Mar 8, 2026

Hello,I have tried setting grad_accumulation_factor to 2 and 4 respectively, but there remains a significant performance gap for conformer_8M.I am currently running experiments with a larger batch size, increasing it from 16 to 32 by using more GPUs.
Could you please provide more detailed parameter settings so that I can better reproduce the results reported in your paper?I have access to 3–4 RTX 3090 GPUs with 24GB memory each.

zzm196 Mar 9, 2026

Make sure that your batch size is large enough. Play with grad accum and nb of GPU. The batch size must be large.
Hello,I followed your advice and used 2×3090 GPUs with 24GB VRAM each. According to the comments, I increased the global batch size to 128 by modifying the parameters in hyperconformer_8M.yaml:
batch_size: 32 (changed from 16 to 32),grad_accumulation_factor: 2 (changed from 1 to 2).However, the results I obtained are even worse than the original results from the default hyperconformer_8M.yaml.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hyperconformer_8M on librispeech train-clean-100 does not perform as well as expected #2955

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 4 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

hyperconformer_8M on librispeech train-clean-100 does not perform as well as expected #2955

Uh oh!

Mattias421 Jul 24, 2025

Replies: 1 comment · 4 replies

Uh oh!

zzm196 Mar 4, 2026

Uh oh!

Mattias421 Mar 4, 2026 Author

Uh oh!

TParcollet Mar 4, 2026 Maintainer

Uh oh!

zzm196 Mar 8, 2026

Uh oh!

zzm196 Mar 9, 2026

Mattias421
Jul 24, 2025

Replies: 1 comment 4 replies

zzm196
Mar 4, 2026

Mattias421 Mar 4, 2026
Author

TParcollet Mar 4, 2026
Maintainer