About Me

I am SHEN Haiyang, a Ph.D. candidate at the School of Computer Science, Peking University. I am advised by Assistant Professor Yun Ma.

My research trajectory started from software systems, then moved toward the intersection of software and AI, and has gradually evolved into my current focus on LLM-based agents. My long-term vision is to better integrate AI with existing software systems, enabling AI to use tools, interact with applications, and improve real-world software workflows more reliably.

My research centers on LLM-based agents. Specifically, I am interested in:

  • Search & Web Agents: building intelligent agents for deep information seeking and web interaction.
  • Coding Agents: developing benchmarks and systems for automated software engineering.
  • Financial Agents: exploring how agents can trade stocks and generate returns in financial markets.
  • Mathematical Reasoning: synthesizing high-quality reasoning data for frontier-level mathematical problem solving.
  • LLM Inference on Edge Devices: efficient LLM deployment on web and mobile platforms.

Across these directions, I have served as a first author or co-first author on 17 papers, and contributed to 33 papers in total.

My research group is affiliated with the Data Space Technology and Systems Research Center, led by Academician Hong Mei and Professor Gang Huang, with faculty members including Xuanzhe Liu, Xin Jin, and Yun Ma.

The center is a leading research group in China for machine learning systems, software engineering, and systems. It has deep talent reserves and research foundations in industrial-scale machine learning systems, edge computing, including satellite, mobile, and embodied computing, software engineering, and artificial intelligence.

Academic and Project Experience

Publications

* Co-first author or project leader. Corresponding author.

Search & Web Agents

  1. Sixiong Xie*, Zhuofan Shi*, Haiyang Shen*, Jiuzheng Wang, Siqi Zhong, Chongyang Pan, Mugeng Liu, Peilun Jia, Baoqing Sun, Xiang Jing, Yun Ma. DeepWeb-Bench: A Deep Research Benchmark Demanding Massive Cross-Source Evidence and Long-Horizon Derivation. 2026. NeurIPS Submission.

  2. Haiyang Shen*, Jiuzheng Wang*, Taian Guo, Mugeng Liu, Wenchun Jing, Weichen Bi, Zhiyang Chen, Yudong Han, Xiaoying Bai, Yun Ma. QuestBench: A Course Curated Benchmark for Expert-Level Cross-Domain Deep Search in Language Models. 2026. NeurIPS Submission.

  3. Ningyuan Li*, Haiyang Shen*, Mugeng Liu, Yudong Han, Zhuofan Shi, Sixiong Xie, Yun Ma. SGR-Bench: Benchmarking Search Agents on State-Gated Retrieval. 2026. NeurIPS Submission.

  4. Zhengwei Tao*, Haiyang Shen*, Baixuan Li*, Wenbiao Yin, Jialong Wu, Kuan Li, Zhongwang Zhang, Huifeng Yin, Rui Ye, Yong Jiang, Pengjun Xie, Fei Huang, Jingren Zhou, Wentao Zhang, Yun Ma, Zhiqiang Gao. Empowering Efficiency and Efficacy in WebAgent via Enabling Info-Rich Seeking. The Fourteenth International Conference on Learning Representations (ICLR). 2026. Top Conference on Machine Learning.
  5. Haiyang Shen*, Hang Yan*, Zhongshi Xing, Mugeng Liu, Yue Li, Zhiyang Chen, Yuxiang Wang, Jiuzheng Wang, Yun Ma. DRAGON: Domain-specific Robust Automatic Data Generation for RAG Optimization. Findings of the Association for Computational Linguistics: EACL 2026. 2026. Top Conference on NLP.
  6. Tongyi DeepResearch Team, Baixuan Li, Bo Zhang, Dingchu Zhang, …, Haiyang Shen, Xinyu Geng, Yuning Wu, Zijian Li, Yong Jiang. Tongyi DeepResearch Technical Report. arXiv:2510.24701. 2025.
  7. Zhengwei Tao*, Jialong Wu*, Wenbiao Yin, Pu Wu, Junkai Zhang, Baixuan Li, Haiyang Shen, Kuan Li, Liwen Zhang, Xinyu Wang, Yong Jiang, Pengjun Xie, Fei Huang, Jingren Zhou, Wentao Zhang. WebShaper: Agentically Data Synthesizing via Information-Seeking Formalization. The Fourteenth International Conference on Learning Representations (ICLR). 2026. Top Conference on Machine Learning.
  8. Zhuofan Shi, Peilun Jia, Baoqing Sun, Haiyang Shen, Sixiong Xie, Yun Ma, Xiang Jing. ViDR: Grounding Multimodal Deep Research Reports in Source Visual Evidence. 2026. NeurIPS Submission.

  9. Mugeng Liu, Shuoqi Li, Qi Yang, Siqi Zhong, Chongyang Pan, Haiyang Shen, Yun Ma. WEBACT: Test-Time Learning of Verifiable Action Interfaces for Web Agents. 2026. NeurIPS Submission.

  10. Wenchun Jing, Haiyang Shen, Haoran Wang, Qi Liu, Ningyuan Li, Chaoran Luo, Ning Zhang, Yun Ma. MCP-Focus: Leveraging Function-Oriented Document Enhancement for MCP Server Retrieval. The ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). 2026.

  11. Baixuan Li*, Dingchu Zhang*, Jialong Wu*, Wenbiao Yin, Zhengwei Tao, Yida Zhao, Liwen Zhang, Haiyang Shen, Runnan Fang, Pengjun Xie, Jingren Zhou, Yong Jiang. ParallelMuse: Agentic Parallel Thinking for Deep Information Seeking. arXiv preprint arXiv:2510.24698. 2025.
  12. Qi Yang, Weichen Bi, Haiyang Shen, Yaoqi Guo, Yun Ma. PixelWeb: The First Web GUI Dataset with Pixel-Wise Labels. arXiv preprint arXiv:2504.16419. 2025.

Coding Agents

  1. Haiyang Shen*, Xuanzhong Chen*, Wendong Xu*, Yun Ma, Liang Chen, Kuan Li. EvoCodeBench: Evaluating Coding Agents in Multi-Turn Iterative Interactions. 2026. NeurIPS Submission.
  2. Xinbo Xu, Ruihan Yang, Haiyang Shen, Wendong Xu, Bofei Gao, Ruoyu Wu, Kean Shi, Weichu Xie, Xuanzhong Chen, Ming Wu, Jason Zeng, Michael Heinrich, Liang Chen, Kuan Li, Baobao Chang. RoadmapBench: Evaluating Long-Horizon Agentic Software Development Across Version Upgrades. 2026. NeurIPS Submission.
  3. Haiyang Shen*, Wendong Xu*, Xuanzhong Chen*, Yun Ma, Liang Chen, Kuan Li. Genesis: Coding Task Synthesis via Iterative Multi-Agent Coordination. 2026. NeurIPS Submission.

  4. Haiyang Shen*, Xinbo Xu*, Xuanzhong Chen, Wendong Xu, Elvis Zhang, Kaiyuan Chen, Xiaobo Hu, Rui Wang, Yang Liu, Yixin Ren, Yuan Gong, Liang Chen, Kuan Li. Monthly-SWEBench: A Living, Rigorously Verified Benchmark for Real-World Software Engineering. 2026. Benchmark.
  5. Haiyang Shen, Yue Li, Desong Meng, Dongqi Cai, Sheng Qi, Li Zhang, Mengwei Xu, Yun Ma. ShortcutsBench: A Large-Scale Real-world Benchmark for API-based Agents. The Thirteenth International Conference on Learning Representations (ICLR). 2025. Top Conference on Machine Learning.
  6. Haiyang Shen, Yue Li, Zhiyang Chen, Yun Ma. EasIPA: Enhancing LLM’s Ability to Select APIs for IPA. International Conference on Service Science. 2025.

  7. Haiyang Shen, Yun Ma, Yue Li, Xiaoling Wang, Deyu Tian, Tong Jia, Tengfei He, Shenghua Luo. ADPal: Automatic Detection of Troubled Users in Online Service Systems via Page Access Logs. 2023 IEEE International Conference on Web Services (ICWS). 2023. Top Conference on Service Computing.
  8. Zhuofan Shi, Hubao A, Yufei Shao, Dongliang Huang, Hongxu An, Chunxiao Xin, Haiyang Shen, Zhenyu Wang, Yunshan Na, Gang Huang, Xiang Jing. MDAgent2: Large Language Model for Code Generation and Knowledge Q&A in Molecular Dynamics. arXiv preprint arXiv:2601.02075. 2026.
  9. Guoqing Wang, Zeyu Sun, Yizhou Chen, Yifan Zhao, Haiyang Shen, Qingyuan Liang, Dan Hao. Beyond the Sum of Parts: Leveraging Entanglement for Bug Inducing Commit Localization. IEEE Transactions on Software Engineering. 2025. Top Journal in Software Engineering.

Financial Agents

  1. Taian Guo*, Haiyang Shen*, Junyu Luo, Zhongshi Xing, Hanchun Lian, Jinsheng Huang, Binqi Chen, Luchen Liu, Yun Ma, Ming Zhang. MEME: Modeling the Evolutionary Modes of Financial Markets. arXiv preprint arXiv:2602.11918. 2026.
  2. Taian Guo*, Haiyang Shen*, Junyu Luo, Binqi Chen, Hongjun Ding, Jinsheng Huang, Luchen Liu, Yun Ma, Ming Zhang. AlphaPROBE: Alpha Mining via Principled Retrieval and On-graph Biased Evolution. arXiv preprint arXiv:2602.11917. 2026.
  3. Taian Guo*, Haiyang Shen*, Jinsheng Huang, Zhengyang Mao, Junyu Luo, Binqi Chen, Zhuoru Chen, Luchen Liu, Bingyu Xia, Yun Ma, Ming Zhang. MASS: Multi-Agent Simulation Scaling for Portfolio Construction. arXiv preprint arXiv:2505.10278. 2025.

Mathematical Reasoning

  1. Haiyang Shen*, Taian Guo*, Xuanzhong Chen*, Mugeng Liu, Sixiong Xie, Zhuofan Shi, Chongyang Pan, Siqi Zhong, Guoqing Wang, Ming Zhang, Yun Ma. MindLoom: Composing Thought Modes for Frontier-Level Reasoning Data Synthesis. 2026. NeurIPS Submission.

LLM Inference on Edge Devices

  1. Siqi Zhong, Mugeng Liu, Haiyang Shen, Chongyang Pan, Yun Ma. LaTune: Lightweight and Adaptive Configuration Tuning for LLM Inference on Edge Devices. Proceedings of the ACM Web Conference 2026. 2026. Top Conference on Web.

  2. Zhiyang Chen, Daliang Xu, Haiyang Shen, Chiheng Lou, Mengwei Xu, Shangguang Wang, Xin Jin, Yun Ma. Accelerating Mobile Language Model via Speculative Decoding and NPU-Coordinated Execution. arXiv preprint arXiv:2510.15312. 2025.
  3. Zhiyang Chen, Yun Ma, Haiyang Shen, Mugeng Liu. WeInfer: Unleashing the Power of WebGPU on LLM Inference in Web Browsers. Proceedings of the ACM on Web Conference 2025. 2025. Top Conference on Web.
  4. Mugeng Liu, Haiyang Shen, Yixuan Zhang, Hong Mei, Yun Ma. WebAssembly for Container Runtime: Are We There Yet? ACM Transactions on Software Engineering and Methodology. 2025. Top Journal in Software Engineering.
  5. Deyu Tian, Haiyang Shen, Yun Ma. Parallelizing DNN Inference in Mobile Web Browsers on Heterogeneous Hardware. Proceedings of the 20th Annual International Conference on Mobile Systems, Applications and Services (MobiSys). 2022. Top Conference on Mobile Computing.

Others

  1. Liang Chen*, Weichu Xie*, Yiyan Liang*, Hongfeng He*, Hans Zhao*, …, Haiyang Shen, Yixin Ren, Yang Liu, Yuan Gong, Kuan Li. BabyVision: Visual Reasoning Beyond Language. The Forty-third International Conference on Machine Learning (ICML). 2026. Top Conference on Machine Learning.

  2. Zijian Shao*, Haiyang Shen*, Mugeng Liu, Guangyu Fu, Yaoqi Guo, Yuxiang Wang, Yun Ma. Rethinking Explainable Disease Prediction: Synergizing Accuracy and Reliability via Reflective Cognitive Architecture. arXiv preprint arXiv:2509.21266. 2025.
  3. Haiyang Shen, Yun Ma. Characterizing the Developer Groups for Metaverse Services in Roblox. 2024 IEEE International Conference on Software Services Engineering (SSE). 2024.

Correspondence