publications | Siyuan Qian

2025

NeurIPS
AC-DiT: Adaptive Coordination Diffusion Transformer for Mobile Manipulation

Siyuan Qian^* and others

In Advances in Neural Information Processing Systems (NeurIPS), 2025

Abs Bib Code

We propose AC-DiT, an end-to-end vision-language-action model framework for mobile manipulation. It features a two-stage action generation mechanism (coarse prediction + diffusion refinement) and achieves significantly better performance on multiple benchmarks and real-robot experiments compared to existing methods.
@inproceedings{qian2025acdit, title = {AC-DiT: Adaptive Coordination Diffusion Transformer for Mobile Manipulation}, author = {Qian, Siyuan and others}, booktitle = {Advances in Neural Information Processing Systems (NeurIPS)}, year = {2025}, }
RSS
RoboMIND: Benchmark on Multi-embodiment Intelligence Normative Data for Robot Manipulation

Siyuan Qian and others

In Robotics: Science and Systems (RSS), 2025

Abs Bib

We propose RoboMIND, a multi-embodiment robot teleoperation dataset covering 107K demonstration trajectories, 479 tasks, 96 object categories, and 4 robot morphologies, including failure cases and digital twin environments. Experiments verify that it significantly improves VLA model success rates and generalization capabilities, making it one of the largest and highest-quality datasets of its kind.
@inproceedings{robomind2025, title = {RoboMIND: Benchmark on Multi-embodiment Intelligence Normative Data for Robot Manipulation}, author = {Qian, Siyuan and others}, booktitle = {Robotics: Science and Systems (RSS)}, year = {2025}, }

2024

VLM
Chain of Thought Prompt Tuning in Vision Language Models

Siyuan Qian and others

2024

Abs Bib

We propose a Chain-of-Thought prompt tuning method that introduces CoT into vision-language models by jointly leveraging visual and textual embedding information. This significantly improves generalization and transfer in image classification tasks, and demonstrates stronger reasoning performance in image-text retrieval and visual question answering tasks, achieving the first successful application of this method in visual tasks.
@article{qian2024cot, title = {Chain of Thought Prompt Tuning in Vision Language Models}, author = {Qian, Siyuan and others}, year = {2024}, }

2023

Nat. Comput. Sci.
Implicit Neural Image Field for Biological Microscopy Image Compression

Siyuan Qian and others

Nature Computational Science, 2023

Abs Bib

We propose an adaptive compression pipeline based on implicit neural representations (INR) that supports arbitrary-shape images and pixel-level decompression. It achieves controllable high compression ratios (up to 512x) and demonstrates effectiveness on various real biological microscopy images, significantly reducing storage and sharing burden while preserving critical analytical information.
@article{qian2023inr, title = {Implicit Neural Image Field for Biological Microscopy Image Compression}, author = {Qian, Siyuan and others}, journal = {Nature Computational Science}, year = {2023}, }