RAG} {Application} {On} {The} {RING}-{GOCAD} {Meeting} {Proceedings} {Archives} {And} {Evaluation

Laure-Oceane Bodolec and Guillaume Caumon and Christine Fay-Varnier and Jairo Cugliari. ( 2025 )
in: 2025 {RING} meeting, pages 100--114, ASGA

Abstract

Natural Language Processing (NLP) tools have proven valuable for information retrieval and linkage. Large Language Models (LLMs), trained on extensive datasets, have been recently widely deployed for their reasoning and analysis abilities across diverse domains. However, they also suffer from their generality for handling small specialized corpus of documents (e.g. disciplinary, proprietary or both). We specialize a local quantized LLM (Mistral-7B-Instruct-v0.2-GPTQ) in geological modeling using Retrieval-Augmented Generation (RAG) on the 35 years of RING GOCAD Meeting proceedings archive. This approach aims to enhance domain-specific knowledge integration, reducing the model hallucinations for searching and analyzing literature, while minimizing the computational costs. Evaluating this model requires adressing the limitations of published open-domain and NLP-focused benchmarks and metrics, as well as time and computational resources. To ensure effective quality assessment, we define a methodology for evaluating local RAG application in specialized and confidential fields, consisting in quantitative and qualitative performance assessment on a benchmark. Quantitative evaluation shows 60\% LLM accuracy across 5 geomodelling-related benchmark themes, and 30 points degradation of the RAG accuracy compared to the language model alone. The RAG underperformance leads to several hypotheses on the system behavior (RAG biaises inheritance from the LLM, semantic biases in document retrieval creating a distracting context for the LLM); and then to system improvement directions (e.g. context lenght reduction, model configuration adaptation).

Download / Links

BibTeX Reference

@inproceedings{Bodolec2025RM,
 abstract = {Natural Language Processing (NLP) tools have proven valuable for information retrieval and linkage. Large Language Models (LLMs), trained on extensive datasets, have been recently widely deployed for their reasoning and analysis abilities across diverse domains. However, they also suffer from their generality for handling small specialized corpus of documents (e.g. disciplinary, proprietary or both). We specialize a local quantized LLM (Mistral-7B-Instruct-v0.2-GPTQ) in geological modeling using Retrieval-Augmented Generation (RAG) on the 35 years of RING GOCAD Meeting proceedings archive. This approach aims to enhance domain-specific knowledge integration, reducing the model hallucinations for searching and analyzing literature, while minimizing the computational costs. Evaluating this model requires adressing the limitations of published open-domain and NLP-focused benchmarks and metrics, as well as time and computational resources. To ensure effective quality assessment, we define a methodology for evaluating local RAG application in specialized and confidential fields, consisting in quantitative and qualitative performance assessment on a benchmark. Quantitative evaluation shows 60\% LLM accuracy across 5 geomodelling-related benchmark themes, and 30 points degradation of the RAG accuracy compared to the language model alone. The RAG underperformance leads to several hypotheses on the system behavior (RAG biaises inheritance from the LLM, semantic biases in document retrieval creating a distracting context for the LLM); and then to system improvement directions (e.g. context lenght reduction, model configuration adaptation).},
 author = {Bodolec, Laure-Oceane and Caumon, Guillaume and Fay-Varnier, Christine and Cugliari, Jairo},
 booktitle = {2025 {RING} meeting},
 language = {en},
 pages = {100--114},
 publisher = {ASGA},
 title = {{RAG} {Application} {On} {The} {RING}-{GOCAD} {Meeting} {Proceedings} {Archives} {And} {Evaluation}},
 year = {2025}
}