Folgen
Yupan Huang
Yupan Huang
Microsoft Research
Bestätigte E-Mail-Adresse bei microsoft.com - Startseite
Titel
Zitiert von
Zitiert von
Jahr
LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking
Y Huang, T Lv, L Cui, Y Lu, F Wei
Proceedings of the 30th ACM International Conference on Multimedia, 2022
3142022
Seeing out of the box: End-to-end pre-training for vision-language representation learning
Z Huang*, Z Zeng*, Y Huang*, B Liu, D Fu, J Fu
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2021
2662021
Probing inter-modality: Visual parsing with self-attention for vision-and-language pre-training
H Xue, Y Huang, B Liu, H Peng, J Fu, H Li, J Luo
Advances in Neural Information Processing Systems 34, 4514-4528, 2021
832021
Unifying multimodal transformer for bi-directional image and text generation
Y Huang, H Xue, B Liu, Y Lu
Proceedings of the 29th ACM International Conference on Multimedia, 1138-1147, 2021
572021
Decoupling localization and classification in single shot temporal action detection
Y Huang, Q Dai, Y Lu
2019 IEEE International Conference on Multimedia and Expo (ICME), 1288-1293, 2019
572019
Textdiffuser: Diffusion models as text painters
J Chen*, Y Huang*, T Lv, L Cui, Q Chen, F Wei
Advances in Neural Information Processing Systems 36, 2024
262024
Kosmos-2.5: A Multimodal Literate Model
T Lv*, Y Huang*, J Chen*, L Cui*, S Ma, Y Chang, S Huang, W Wang, ...
arXiv preprint arXiv:2309.11419, 2023
242023
Reinforced short-length hashing
X Liu, X Nie, Q Dai, Y Huang, L Lian, Y Yin
IEEE Transactions on Circuits and Systems for Video Technology 31 (9), 3655-3668, 2020
232020
Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Models
Y Huang, Z Meng, F Liu, Y Su, N Collier, Y Lu
arXiv preprint arXiv:2308.16463, 2023
142023
TextDiffuser-2: Unleashing the Power of Language Models for Text Rendering
J Chen, Y Huang, T Lv, L Cui, Q Chen, F Wei
arXiv preprint arXiv:2311.16465, 2023
82023
A picture is worth a thousand words: A unified system for diverse captions and rich images generation
Y Huang, B Liu, J Fu, Y Lu
Proceedings of the 29th ACM International Conference on Multimedia, 2792-2794, 2021
82021
Be specific, be clear: Bridging machine and human captions by scene-guided transformer
Y Huang, Z Zeng, Y Lu
Proceedings of the 2021 Workshop on Multi-Modal Pre-Training for Multimedia …, 2021
62021
Das System kann den Vorgang jetzt nicht ausführen. Versuchen Sie es später erneut.
Artikel 1–12