Exploring the limits of transfer learning with a unified text-to-text transformer C Raffel, N Shazeer, A Roberts, K Lee, S Narang, M Matena, Y Zhou, W Li, ... The Journal of Machine Learning Research 21 (1), 5485-5551, 2020 | 11647 | 2020 |
Exploring the limits of transfer learning with a unified text-to-text transformer A Roberts, C Raffel, K Lee, M Matena, N Shazeer, PJ Liu, S Narang, W Li, ... | 234 | 2019 |
Exploring the limits of transfer learning with a unified text-to-text transformer (2019) C Raffel, N Shazeer, A Roberts, K Lee, S Narang, M Matena, Y Zhou, W Li, ... arXiv preprint arXiv:1910.10683, 1910 | 129 | 1910 |
Do transformer modifications transfer across implementations and applications? S Narang, HW Chung, Y Tay, W Fedus, T Fevry, M Matena, K Malkan, ... arXiv preprint arXiv:2102.11972, 2021 | 73 | 2021 |
Merging models with fisher-weighted averaging MS Matena, CA Raffel Advances in Neural Information Processing Systems 35, 17703-17716, 2022 | 72 | 2022 |
NPEFF: Non-Negative Per-Example Fisher Factorization M Matena, C Raffel arXiv preprint arXiv:2310.04649, 2023 | | 2023 |
A Combinatorial Perspective on the Optimization of Shallow ReLU Networks MS Matena, CA Raffel Advances in Neural Information Processing Systems 35, 22187-22198, 2022 | | 2022 |