Yanlin Wang's Homepage

alt text 

Assistant Professor
School of Software Engineering
Sun Yat-sen University
E-mail: yanlin-wang AT outlook DOT com
中文主页

About me

I joined Software Engineering School, Sun Yat-sen University as an assistant professor in July 2022. Before that, I worked at Microsoft Research Asia, Data, Knowledge, and Intelligence (DKI) group as a senior researcher. I was very fortunate to work under the supervision of Dr. Dongmei Zhang and Shi Han in MSRA. I received my B.S. degree from Zhejiang University and PhD degree from the University of Hong Kong under the supervision of Prof. Bruno C. d. S. Oliveira.

My research interests include large language models, software engineering, deep learning, and programming languages, particularly in intelligent software engineering. Our group is always recruiting undergraduate/master students. Please contact me with your CV attached if you are interested in the above topics.

Recent news

Publications

  1. When to Stop? Towards Efficient Code Generation in LLMs with Excess Token Prevention (ISSTA 2024, CCF-A)
    Lianghong Guo, Yanlin Wang*, Ensheng Shi, Wanjun Zhong, Hongyu Zhang, Jiachi Chen, Ruikai Zhang, Yuchi Ma, Zibin Zheng.

  2. RLCoder: Reinforcement Learning for Repository-Level Code Completion (ICSE 2025, CCF-A)
    Yanlin Wang, Yanli Wang, Daya Guo, Jiachi Chen, Ruikai Zhang, Yuchi Ma, Zibin Zheng.

  3. Identifying Smart Contract Security Issues in Code Snippets from Stack Overflow (ISSTA 2024, CCF-A)
    Jiachi Chen, Chong Chen, Jiang Hu, John Grundy, Yanlin Wang, Ting Chen, Zibin Zheng

  4. Beyond Functional Correctness: Investigating Coding Style Inconsistencies in Large Language Models [preprint ]
    Yanlin Wang, Tianyue Jiang, Mingwei Liu, Jiachi Chen, Zibin Zheng

  5. Demystifying and Detecting Cryptographic Defects in Ethereum Smart Contracts (ICSE 2025, CCF-A)
    Jiashuo Zhang, Yiming Shen, Jiachi Chen, Jianzhong Su, Yanlin Wang, Ting Chen, Jianbo Gao, Zhong Chen

  6. Hyperion: Unveiling DApp Inconsistencies using LLM and Dataflow-Guided Symbolic Execution (ICSE 2025, CCF-A)
    Shuo Yang, Xingwei Lin, Jiachi Chen, Qingyuan Zhong, Lei Xiao, Renke Huang, Yanlin Wang, Zibin Zheng

  7. CoSQA+: Enhancing Code Search Dataset with Matching Code [preprint ]
    Jing Gong, Yanghui Wu, Linxi Liang, Zibin Zheng, Yanlin Wang

  8. YODA: Teacher-Student Progressive Learning for Language Models [preprint]
    Jianqiao Lu, Wanjun Zhong, Yufei Wang, Zhijiang Guo, Qi Zhu, Wenyong Huang, Yanlin Wang, Fei Mi, Baojun Wang, Yasheng Wang, Lifeng Shang, Xin Jiang, Qun Liu

  9. Tackling Long Code Search with Splitting, Encoding, and Aggregating.
    Fan Hu, Yanlin Wang*, Lun Du, Hongyu Zhang, Dongmei Zhang and Xirong Li
    International Conference on Computational Linguistics, Language Resources and Evaluation. (LREC-COLING 2024, CCF-B)

  10. MoonBit: Explore the Design of an AI-Friendly Programming Language. In LLM4Code Workshop 2024.
    Haoxiang Fei, Yu Zhang, Hongbo Zhang, Yanlin Wang, Qing Liu

  11. An Empirical Study on Low Code Programming using Traditional vs Large Language Model Support. [preprint]
    Yongkun Liu, Jiachi Chen, Tingting Bi, John Grundy, Yanlin Wang, Ting Chen, Yutian Tang, Zibin Zheng

  12. KADEL: Knowledge-Aware Denoising Learning for Commit Message Generation. [pdf ]
    Wei Tao, Yucheng Zhou, Yanlin Wang*, Hongyu Zhang, Haofen Wang, Wenqiang Zhang*
    ACM Transactions on Software Engineering and Methodology 2024. (TOSEM 2024, CCF-A)

  13. SparseCoder: Identifier-Aware Sparse Transformer for File-Level Code Summarization. [pdf ]
    Yanlin Wang, Yanxian Huang, Daya Guo, Hongyu Zhang and Zibin Zheng
    The IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER 2024, CCF-B)

  14. Adaptive-Solver Framework for Dynamic Strategy Selection in Large Language Model Reasoning [preprint]
    Jianpeng Zhou, Wanjun Zhong, Yanlin Wang, Jiahai Wang

  15. SoTaNa: The Open-Source Software Development Assistant [preprint]
    Ensheng Shi, Fengji Zhang, Yanlin Wang, Bei Chen, Lun Du, Hongyu Zhang, Shi Han, Dongmei Zhang, Hongbin Sun

  16. A Survey of Large Language Models for Code: Evolution, Benchmarking, and Future Trends [preprint]
    Zibin Zheng, Kaiwen Ning, Yanlin Wang, Jingwen Zhang, Dewu Zheng, Mingxi Ye, Jiachi Chen

  17. Towards an Understanding of Large Language Models in Software Engineering Tasks [preprint]
    Zibin Zheng, Kaiwen Ning, Jiachi Chen, Wenqing Chen, Lianghong Guo, Weicheng Wang, Yanlin Wang

  18. When ChatGPT Meets Smart Contract Vulnerability Detection: How Far Are We? [preprint]
    Chong Chen, Jianzhong Su, Jiachi Chen, Yanlin Wang, Tingting Bi, Yanli Wang, Xingwei Lin, Ting Chen, Zibin Zheng

  19. AGIEval: A Human-Centric Benchmark for Evaluating Foundation Models [preprint]
    Wanjun Zhong, Ruixiang Cui, Yiduo Guo, Yaobo Liang, Shuai Lu, Yanlin Wang, Amin Saied, Weizhu Chen, Nan Duan

  20. MemoryBank: Enhancing Large Language Models with Long-Term Memory [preprint]
    Wanjun Zhong, Lianghong Guo, Qiqi Gao, He Ye, Yanlin Wang
    In Proceedings of the 37th AAAI Conference on Artificial Intelligence. (AAAI 2024).

  21. Code Search Debiasing: Improve Search Results beyond Overall Ranking Performance
    Sheng Zhang, Hui Li, Yanlin Wang, Zhao Wei, yong xu, Juhong Wang, Rongrong Ji (EMNLP 2023 Findings, CCF-B).

  22. Towards Efficient Fine-tuning of Pre-trained Code Models: An Experimental Study and Beyond [pdf]
    Ensheng Shi, Yanlin Wang, Hongyu Zhang, Lun Du, Shi Han, Dongmei Zhang, Hongbin Sun
    In The ACM SIGSOFT International Symposium on Software Testing and Analysis. (ISSTA 2023, CCF-A).

  23. Re2BERT: A Two-stage Pre-trained Framework for Automatic Rename Refactoring [pdf]
    Hao Liu, Yanlin Wang, Zhao Wei, Yong Xu, Juhong Wang, Hui Li, Rongrong Ji
    In The ACM SIGSOFT International Symposium on Software Testing and Analysis. (ISSTA 2023, CCF-A).

  24. DeFiTainter: Detecting Price Manipulation Vulnerabilities in DeFi Protocols [pdf]
    Queping Kong, Jiachi Chen, Yanlin Wang, Zigui Jiang, Zibin Zheng
    In The ACM SIGSOFT International Symposium on Software Testing and Analysis. (ISSTA 2023, CCF-A).

  25. Toward Automated Detecting Unanticipated Price Feed in Smart Contract[pdf]
    Yifan Mo, Jiachi Chen, Yanlin Wang, Zibin Zheng
    In The ACM SIGSOFT International Symposium on Software Testing and Analysis. (ISSTA 2023, CCF-A).

  26. Snippet Comment Generation Based on Code Context Expansion [pdf]
    Hanyang Guo, Xiangping Chen, Yuan Huang, Yanlin Wang, Xi Ding, Zibin Zheng, Xiaocong Zhou, Hong-Ning Dai
    In ACM Transactions on Software Engineering and Methodology (TOSEM 2023, CCF-A)

  27. CoCoAST: Representing Source Code via Hierarchical Splitting and Reconstruction of Abstract Syntax Trees[pdf]
    Ensheng Shi#, Yanlin Wang#, Lun Du, Hongyu Zhang, Shi Han, Dongmei Zhang, Hongbin Sun
    In Empirical Software Engineering (EMSE 2023, CCF-B)

  28. You Augment Me: Exploring ChatGPT-based Data Augmentation for Semantic Code Search [pdf]
    Yanlin Wang, Lianghong Guo, Ensheng Shi, Wenqing Chen, Jiachi Chen, Wanjun Zhong, Menghan Wang, Hui Li, Ziyu Lyu, Hongyu Zhang and Zibin Zheng
    In 39th IEEE International Conference on Software Maintenance and Evolution. (ICSME 2023, CCF-B)

  29. PrivateRec: Differentially Private Model Training and Online Serving for Federated News Recommendation[pdf]
    Ruixuan Liu, Yang Cao, Yanlin Wang, Lingjuan Lyu, Yun Chen, Hong Chen
    KDD 2023 Applied Data Science (KDD 2023).

  30. Unveiling the Black Box of PLMs with Semantic Anchors: Towards Interpretable Neural Semantic Parsing[pdf]
    Lunyiu Nie, Jiuding Sun, Yanlin Wang, Lun Du, Lei Hou, Juanzi Li, Shi Han, Dongmei Zhang, Jidong Zhai
    In Proceedings of the 36th AAAI Conference on Artificial Intelligence. (AAAI 2023).

  31. CoCoSoDa: Effective Contrastive Learning for Code Search [pdf]
    Ensheng Shi, Yanlin Wang, Wenchao Gu, Lun Du, Hongyu Zhang, Shi Han, Dongmei Zhang, Hongbin Sun
    In proceedings of the 45th IEEE/ACM International Conference on Software Engineering (ICSE 2023).

  32. A large-scale empiricalstudy of commit message generation: models, datasets and evaluation
    Wei Tao, Yanlin Wang, Ensheng Shi, Lun Du, Shi Han, Hongyu Zhang, Dongmei Zhang, Wenqiang Zhang
    In Empirical Software Engineering. (EMSE 2022).
    [pdf]

  33. Revisiting Code Search in a Two-Stage Paradigm
    Fan Hu, Yanlin Wang, Lun Du, Xirong Li, Hongyu Zhang, Shi Han, Dongmei Zhang
    In 15th ACM International WSDM Conference. (WSDM 2023).
    [pdf]

  34. RACE: Retrieval-Augmented Commit Message Generation
    Ensheng Shi, Yanlin Wang, Wei Tao, Lun Du, hongyu Zhang, Shi Han, Dongmei Zhang and Hongbin Sun
    In The 2022 Conference on Empirical Methods in Natural Language Processing. (EMNLP 2022).
    [pdf]

  35. Exploring Representation-level Augmentation for Code Search
    Haochen Li, Chunyan Miao, Yanxian Huang, Yuan Huang, Hongyu Zhang and Yanlin Wang
    In The 2022 Conference on Empirical Methods in Natural Language Processing. (EMNLP 2022).
    [pdf]

  36. An overview of Web3.0 Technology: Infrastructure, Applications, and Popularity [preprint]
    Renke Huang, Jiachi Chen, Yanlin Wang, Tingting Bi, Zibin Zheng

  37. No One Left Behind: Inclusive Federated Learning over Heterogeneous Devices
    Ruixuan Liu, Fangzhao Wu, Chuhan Wu, Yanlin Wang, Lingjuan Lyu, Hong Chen, Xing Xie
    In ACM SIGKDD 2022 Applied Data Science Track. (KDD 2022).
    [pdf]

  38. Accelerating Code Search with Deep Hashing and Code Classification
    Wenchao Gu, Yanlin Wang, Lun Du, Hongyu Zhang, Shi Han, Dongmei Zhang, Michael Lyu
    In 60th Annual Meeting of the Association for Computational Linguistics. (ACL 2022).
    [pdf]

  39. UniXcoder: Unified Cross-Modal Pre-training for Code Representation
    Daya Guo, Shuai Lu, Nan Duan, Yanlin Wang, Ming Zhou, Jian Yin
    In 60th Annual Meeting of the Association for Computational Linguistics. (ACL 2022).
    [pdf] [code]

  40. On the Evaluation of Neural Code Summarization
    Ensheng Shi, Yanlin Wang, Lun Du, Junjie Chen, Shi Han, Hongyu Zhang, Dongmei Zhang, Hongbin Sun
    In International Conference on Software Engineering. (ICSE 2022).
    [pdf] [code]

  41. LibDB: An Effective and Efficient Framework for Detecting Third-Party Libraries in Binaries
    Wei Tang, Yanlin Wang, Hongyu Zhang, Shi Han, Ping Luo, Dongmei Zhang
    In Mining Software Repositories 2022. (MSR 2022).
    [pdf] [code]

  42. CAST: Enhancing Code Summarization with Hierarchical Splitting and Reconstruction of Abstract Syntax Trees
    Ensheng Shi#, Yanlin Wang#, Lun Du, Hongyu Zhang, Shi Han, Dongmei Zhang, Hongbin Sun
    In 2021 Conference on Empirical Methods in Natural Language Processing. (EMNLP 2021).
    [pdf] [code]

  43. Is a Single Model Enough? MuCoS: A Multi-Model Ensemble Learning for Semantic Code Search
    Lun Du, Xiaozhou Shi, Yanlin Wang, Ensheng Shi, Shi Han, Dongmei Zhang
    In 30th ACM International Conference on Information and Knowledge Management. (CIKM 2021).
    [pdf] [code]

  44. On the Evaluation of Commit Message Generation Models: An Experimental Study
    Wei Tao, Yanlin Wang, Ensheng Shi, Lun Du, Shi Han, Hongyu Zhang, Dongmei Zhang and Wenqiang Zhang
    In 37th International Conference on Software Maintenance and Evolution. (ICSME 2021).
    [pdf] [code]

  45. Code Completion by Modeling Flattened Abstract Syntax Trees as Graphs.
    Yanlin Wang and Hui Li
    In Proceedings of the 35th AAAI Conference on Artificial Intelligence. (AAAI 2021).
    [pdf]

  46. CoCoSUM: Contextual Code Summarization with Multi-Relational Graph Neural Network
    Yanlin Wang, Ensheng Shi, Lun Du, Xiaodi Yang, Yuxuan Hu, Shi Han, Hongyu Zhang, Dongmei Zhang
    [pdf]

  47. Multi-task Learning for Recommendation over Heterogeneous Information Network.
    Hui Li, Yanlin Wang, Ziyu Lyu and Jieming Shi
    In IEEE Transactions on Knowledge and Data Engineering (TKDE 2020).
    [link]

  48. FHJ: A Formal Model for Hierarchical Dispatching and Overriding.
    Yanlin Wang, Haoyuan Zhang, Bruno C. d. S. Oliveira and Marco Servetto
    In Proceedings of the 32nd European Conference on Object-Oriented Programming. (ECOOP 2018).
    [pdf]

  49. Classless Java.
    Yanlin Wang, Haoyuan Zhang, Marco Servetto and Bruno C. d. S. Oliveira
    In International Conference on Generative Programming: Concepts and Experiences. (GPCE 2016).
    [pdf] [code]

  50. The Expression Problem, Trivially!
    Yanlin Wang, Bruno C. d. S. Oliveira
    In Proceedings of the 15th International Conference on Modularity. (Modularity 2016, Best Paper Award).
    [link] [pdf]

  51. Product Lines of Interpreters Using Truffle with Object Algebras.
    Yanlin Wang, Tomas Tauber and Bruno C. d. S. Oliveira
    In Proceedings of the 1st Truffle/Graal Languages Workshop, 29th European Conference on Object-Oriented Programming. (Truffle@ECOOP 2015).

Services

SCI Group

SCI for Source Code Intelligence. I am very fortunate to work with my talented students and collaborators:

Teaching

Useful Links

Deadlines: ddl-all
CCF list: ccf.atom.im