SciArena: An Open Evaluation Platform for Foundation Models in Scientific Literature Tasks
Yilun Zhao, Kaiyan Zhang, Tiansheng Hu, Sihong Wu, Ronan Le Bras, Taira Anderson, Jonathan Bragg, Joseph Chee Chang, Jesse Dodge, Matt Latzke, Yixin Liu, Charles McGrady, Xiangru Tang, Zihang Wang, Chen Zhao, Hannaneh Hajishirzi, Doug Downey, Arman Cohan Preprint [code/data] [blog] [platform]
MMVU: Measuring Expert-Level Multi-Discipline Video Understanding
Yilun Zhao, Lujing Xie, Haowei Zhang, Guo Gan, Yitao Long, Zhiyuan Hu, Tongyan Hu, Weiyuan Chen, Chuhan Li, Junyang Song, Zhijian Xu, Chengye Wang, Weifeng Pan, Ziyao Shangguan, Xiangru Tang, Zhenwen Liang, Yixin Liu, Chen Zhao, Arman Cohan CVPR 2025 [code/data]
FinanceMath: Knowledge-Intensive Math Word Problem Solving in Finance Domains
Yilun Zhao*, Hongjun Liu*, Yitao Long, Rui Zhang, Chen Zhao, Arman Cohan ACL 2024 (Oral) [code/data]
DocMath-Eval: Evaluating Numerical Reasoning Capabilities of LLMs in Understanding Long and Specialized Documents
Yilun Zhao*, Yitao Long*, Hongjun Liu, Linyong Nan, Lyuhao Chen, Ryo Kamoi, Yixin Liu, Xiangru Tang, Rui Zhang, Arman Cohan ACL 2024 (Oral) [code/data]
TaPERA: A Modular Framework for Long-form Table Question Answering
Yilun Zhao, Lyuhao Chen, Arman Cohan, Chen Zhao ACL 2024
LoFT: Enhancing Faithfulness and Diversity for Table-to-Text Generation via Logic Form Control
Yilun Zhao*, Zhenting Qi*, Linyong Nan, LJ Yu Flores, Dragomir Radev EACL 2023 (Oral) [code/data]
ReasTAP: Injecting Table Reasoning Skills During Pre-training via Synthetic Reasoning Examples
Yilun Zhao, Linyong Nan, Zhenting Qi, Rui Zhang, Dragomir Radev EMNLP 2022 [code/data]
|