| | --- |
| | library_name: RAT |
| | language: |
| | - en |
| | license: mit |
| | datasets: |
| | - HuggingFaceFW/fineweb-edu |
| | tags: |
| | - efficient architecture |
| | - recurrence |
| | - attention |
| | - pretraining |
| | metrics: |
| | - perplexity |
| | - accuracy |
| | --- |
| | |
| | ## Description |
| | Models trained from [RAT Paper](https://arxiv.org/abs/2507.04416). |
| |
|
| | ## Citation |
| | If you find it useful, please consider citing the paper: |
| | ``` |
| | @article{wei2025rat, |
| | title={RAT: Bridging RNN Efficiency and Attention Accuracy via Chunk-based Sequence Modeling}, |
| | author={Wei, Xiuying and Yadav, Anunay and Pascanu, Razvan and Gulcehre, Caglar}, |
| | journal={arXiv preprint arXiv:2507.04416}, |
| | year={2025} |
| | } |
| | ``` |