QANet: Combining Local Convolution with Global Self-Attention for Reading Comprehension AW Yu, D Dohan, MT Luong, R Zhao, K Chen, M Norouzi, QV Le ICLR 2018, 2018 | 1119* | 2018 |
Finetuned language models are zero-shot learners J Wei, M Bosma, VY Zhao, K Guu, AW Yu, B Lester, N Du, AM Dai, QV Le arXiv preprint arXiv:2109.01652, 2021 | 579 | 2021 |
Simvlm: Simple visual language model pretraining with weak supervision Z Wang, J Yu, AW Yu, Z Dai, Y Tsvetkov, Y Cao arXiv preprint arXiv:2108.10904, 2021 | 357 | 2021 |
Scaling instruction-finetuned language models HW Chung, L Hou, S Longpre, B Zoph, Y Tay, W Fedus, E Li, X Wang, ... arXiv preprint arXiv:2210.11416, 2022 | 266 | 2022 |
Orthogonal weight normalization: Solution to optimization over multiple dependent stiefel manifolds in deep neural networks L Huang, X Liu, B Lang, AW Yu, B Li AAAI 2018, 2017 | 187 | 2017 |
Glam: Efficient scaling of language models with mixture-of-experts N Du, Y Huang, AM Dai, S Tong, D Lepikhin, Y Xu, M Krikun, Y Zhou, ... International Conference on Machine Learning, 5547-5569, 2022 | 175* | 2022 |
Learning to skim text AW Yu, H Lee, QV Le ACL 2017, 2017 | 143 | 2017 |
Combined scaling for zero-shot transfer learning H Pham, Z Dai, G Ghiasi, H Liu, AW Yu, MT Luong, M Tan, QV Le arXiv preprint arXiv:2111.10050, 2021 | 107* | 2021 |
Deepfusion: Lidar-camera deep fusion for multi-modal 3d object detection Y Li, AW Yu, T Meng, B Caine, J Ngiam, D Peng, J Shen, Y Lu, D Zhou, ... Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2022 | 99 | 2022 |
Neural symbolic reader: Scalable integration of distributed and symbolic representations for reading comprehension X Chen, C Liang, AW Yu, D Zhou, D Song, QV Le International Conference on Learning Representations, 2020 | 86 | 2020 |
Adadelay: Delay adaptive distributed stochastic convex optimization S Sra, AW Yu, M Li, AJ Smola AISTATS 2016, 2016 | 75* | 2016 |
Compositional generalization via neural-symbolic stack machines X Chen, C Liang, AW Yu, D Song, D Zhou Advances in Neural Information Processing Systems 33, 1690-1701, 2020 | 65 | 2020 |
On computationally tractable selection of experiments in measurement-constrained regression models Y Wang, AW Yu, A Singh The Journal of Machine Learning Research 18 (1), 5238-5278, 2017 | 61* | 2017 |
An improved gap-dependency analysis of the noisy power method MF Balcan, SS Du, Y Wang, AW Yu Conference on Learning Theory, 284-309, 2016 | 61 | 2016 |
AutoHAS: Efficient hyperparameter and architecture search X Dong, M Tan, AW Yu, D Peng, B Gabrys, QV Le arXiv preprint arXiv:2006.03656, 2020 | 53* | 2020 |
Dscovr: Randomized primal-dual block coordinate algorithms for asynchronous distributed optimization L Xiao, AW Yu, Q Lin, W Chen The Journal of Machine Learning Research 20 (1), 1634-1691, 2019 | 47 | 2019 |
Towards zero-label language learning Z Wang, AW Yu, O Firat, Y Cao arXiv preprint arXiv:2109.09193, 2021 | 45 | 2021 |
BLOCK-NORMALIZED GRADIENT METHOD: AN EMPIRICAL STUDY FOR TRAINING DEEP NEURAL NETWORK AW Yu, L Huang, Q Lin, R Salakhutdinov, J Carbonell | 42* | 2018 |
Doubly stochastic primal-dual coordinate method for bilinear saddle-point problem AW Yu, Q Lin, T Yang arXiv preprint arXiv:1508.03390, 2015 | 36* | 2015 |
Reverse top-k search using random walk with restart AW Yu, N Mamoulis, H Su VLDB 2014, 2014 | 36 | 2014 |