dm.cs.tu-dortmund.de/mlbits/neural-nlp-decoders/
Decoder Models – Lecture Notes
2978–2988.
[FaLeDa18]
Fan, A., Lewis, M. and Dauphin, Y.N. 2018. Hierarchical neural story generation . Proc. Assoc. Computational linguistics, ACL (2018), 889–898.
[KFPV25]
Kamath, A. et al. 2025. Gemma 3 technical [...] linguistics, ACL (2023), 27–34.
[RaNa18]
Radford, A. and Narasimhan, K. 2018. Improving language understanding by generative pre-training. (2018).
[RWCL19]
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D. …