Folgen
Cheng Li
Cheng Li
Black Forest Labs
Bestätigte E-Mail-Adresse bei blackforestlabs.ai - Startseite
Titel
Zitiert von
Zitiert von
Jahr
Sirius: An open end-to-end voice and vision personal assistant and its implications for future warehouse scale computers
J Hauswald, MA Laurenzano, Y Zhang, C Li, A Rovinski, A Khurana, ...
Proceedings of the Twentieth International Conference on Architectural …, 2015
3442015
Stochastic circuits for real-time image-processing applications
A Alaghi, C Li, JP Hayes
Proceedings of the 50th Annual Design Automation Conference, 1-6, 2013
3182013
Deepspeed-inference: enabling efficient inference of transformer models at unprecedented scale
RY Aminabadi, S Rajbhandari, AA Awan, C Li, D Li, E Zheng, O Ruwase, ...
SC22: International Conference for High Performance Computing, Networking …, 2022
2792022
Djinn and tonic: Dnn as a service and its implications for future warehouse scale computers
J Hauswald, Y Kang, MA Laurenzano, Q Chen, C Li, T Mudge, ...
ACM SIGARCH Computer Architecture News 43 (3S), 27-40, 2015
2052015
Accelerating reduction and scan using tensor core units
A Dakkak, C Li, J Xiong, I Gelado, W Hwu
Proceedings of the ACM International Conference on Supercomputing, 46-57, 2019
1042019
KLAP: Kernel launch aggregation and promotion for optimizing dynamic parallelism
I El Hajj, J Gómez-Luna, C Li, LW Chang, D Milojicic, W Hwu
2016 49th Annual IEEE/ACM International Symposium on Microarchitecture …, 2016
502016
Evaluating characteristics of CUDA communication primitives on high-bandwidth interconnects
C Pearson, A Dakkak, S Hashash, C Li, IH Chung, J Xiong, WM Hwu
Proceedings of the 2019 ACM/SPEC International Conference on Performance …, 2019
442019
Zeroquant-v2: Exploring post-training quantization in llms from comprehensive study to low rank compensation
Z Yao, X Wu, C Li, S Youn, Y He
arXiv preprint arXiv:2303.08302, 2023
412023
A comprehensive study on post-training quantization for large language models
Z Yao, C Li, X Wu, S Youn, Y He
arXiv preprint arXiv:2303.08302, 2023
362023
XSP: Across-stack profiling and analysis of machine learning models on GPUs
C Li, A Dakkak, J Xiong, W Wei, L Xu, W Hwu
2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS …, 2020
36*2020
Designing future warehouse-scale computers for sirius, an end-to-end voice and vision personal assistant
J Hauswald, MA Laurenzano, Y Zhang, H Yang, Y Kang, C Li, A Rovinski, ...
ACM Transactions on Computer Systems (TOCS) 34 (1), 1-32, 2016
342016
Trims: Transparent and isolated model sharing for low latency deep learning inference in function-as-a-service
A Dakkak, C Li, SG De Gonzalo, J Xiong, W Hwu
2019 IEEE 12th International Conference on Cloud Computing (CLOUD), 372-382, 2019
312019
Ai matrix: A deep learning benchmark for alibaba data centers
W Zhang, W Wei, L Xu, L Jin, C Li
arXiv preprint arXiv:1909.10562, 2019
242019
Understanding int4 quantization for transformer models: Latency speedup, composability, and failure cases
X Wu, C Li, RY Aminabadi, Z Yao, Y He
arXiv preprint arXiv:2301.12017, 2023
232023
Deepspeed data efficiency: Improving deep learning model quality and training efficiency via efficient data sampling and routing
C Li, Z Yao, X Wu, M Zhang, C Holmes, C Li, Y He
Proceedings of the AAAI Conference on Artificial Intelligence 38 (16), 18490 …, 2024
172024
Understanding int4 quantization for language models: latency speedup, composability, and failure cases
X Wu, C Li, RY Aminabadi, Z Yao, Y He
International Conference on Machine Learning, 37524-37539, 2023
162023
Mpress: Democratizing billion-scale model training on multi-gpu servers via memory-saving inter-operator parallelism
Q Zhou, H Wang, X Yu, C Li, Y Bai, F Yan, Y Xu
2023 IEEE International Symposium on High-Performance Computer Architecture …, 2023
162023
Frustrated with replicating claims of a shared model? a solution
A Dakkak, C Li, J Xiong, WM Hwu
arXiv preprint arXiv:1811.09737, 2018
16*2018
Matrix factorization on gpus with memory optimization and approximate computing
W Tan, S Chang, L Fong, C Li, Z Wang, L Cao
Proceedings of the 47th International Conference on Parallel Processing, 1-10, 2018
162018
Acm
Y Wang, W Feng, Y Chen, H Yu, M Huang, PS Yu
Visual Domain Adaptation with Manifold Embedded Distribution Alignment, 402-410, 2018
152018
Das System kann den Vorgang jetzt nicht ausführen. Versuchen Sie es später erneut.
Artikel 1–20