GNN-Based Multi-Turn Dialogue Representation for Retrieval Augmented Generation

Photo by rawpixel on Unsplash

Planned Submission to ACL 2024, 1st Author, Supervised by Prof. Xiaofan Zhang, Shanghai AI Lab

  • Identified a new research task in the information retrieval part of RAG, filling the existing gap by focusing on utilizing multi-turn dialogues as queries to effectively search for the best document in a specific database.
  • Built a new dataset for the task: collected medical multi-turn dialogues and retrieved medical docs with them from the lab’s medical database, integrating medical InternLM (20B) to rank docs by model perplexity.
  • Pioneered a Graph Convolutional Network (GCN) approach and designed an innovative graph structure: dialogue turns as nodes and syntactic trees (N-LTP) as edges, capturing key information from the dialogues. Optimized the web search vector and calculated similarity between enhanced vector and the vectors of docs derived from BERT.
  • Achieved an impressive MRR at 0.71, and NDCG at 0.73 as baseline based on 7,500 manually annotated dialogues, showing the model’s reliable accuracy in retrieving the most suitable document from the database.
Zelin Li
Zelin Li

My research interests include Natural Language Processing, Large Language Models and Data Science.