Speechgpt: Empowering large language models with intrinsic cross-modal conversational abilities D Zhang, S Li, X Zhang, J Zhan, P Wang, Y Zhou, X Qiu The 2023 Conference on Empirical Methods in Natural Language Processing …, 2023 | 297 | 2023 |
SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models X Zhang, D Zhang, S Li, Y Zhou, X Qiu The Twelfth International Conference on Learning Representations, 2024, 2024 | 128* | 2024 |
AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling J Zhan, J Dai, J Ye, Y Zhou, D Zhang, Z Liu, X Zhang, R Yuan, G Zhang, ... ACL 2024, 2024 | 109 | 2024 |
SpeechGPT-Gen: Scaling Chain-of-Information Speech Generation D Zhang*, X Zhang*, J Zhan, S Li, Y Zhou, X Qiu arXiv preprint arXiv:2401.13527, 2024 | 22 | 2024 |
SpeechAlign: Aligning Speech Generation to Human Preferences D Zhang, Z Li, S Li, X Zhang, P Wang, Y Zhou, X Qiu The Thirty-Eighth Annual Conference on Neural Information Processing Systems …, 2024 | 15 | 2024 |
Speechagents: Human-communication simulation with multi-modal multi-agent systems D Zhang, Z Li, P Wang, X Zhang, Y Zhou, X Qiu arXiv preprint arXiv:2401.03945, 2024 | 9 | 2024 |
Intrinsicvoice: Empowering llms with intrinsic real-time voice interaction abilities X Zhang, X Lyu, Z Du, Q Chen, D Zhang, H Hu, C Tan, T Zhao, Y Wang, ... arXiv preprint arXiv:2410.08035, 2024 | 2 | 2024 |