Analyzing multi-head self-attention: Specialized heads do the heavy lifting, the rest can be pruned E Voita, D Talbot, F Moiseev, R Sennrich, I Titov arXiv preprint arXiv:1905.09418, 2019 | 1035 | 2019 |
Gemini: a family of highly capable multimodal models G Team, R Anil, S Borgeaud, Y Wu, JB Alayrac, J Yu, R Soricut, ... arXiv preprint arXiv:2312.11805, 2023 | 503 | 2023 |
SKILL: Structured knowledge infusion for large language models F Moiseev, Z Dong, E Alfonseca, M Jaggi arXiv preprint arXiv:2205.08184, 2022 | 61 | 2022 |
Analyzing multi-head self-attention: Specialized heads do the heavy lifting, the rest can be pruned. arXiv 2019 E Voita, D Talbot, F Moiseev, R Sennrich, I Titov arXiv preprint arXiv:1905.09418, 0 | 22 | |
Analyzing multi-head self-attention: Specialized heads do the heavy lifting, the rest can be pruned. arXiv E Voita, D Talbot, F Moiseev, R Sennrich, I Titov arXiv preprint arXiv:1905.09418, 2019 | 13 | 2019 |
SamToNe: Improving Contrastive Loss for Dual Encoder Retrieval Models with Same Tower Negatives F Moiseev, GH Abrego, P Dornbach, I Zitouni, E Alfonseca, Z Dong arXiv preprint arXiv:2306.02516, 2023 | 2 | 2023 |