RAG} {Application} {On} {The} {RING}-{GOCAD} {Meeting} {Proceedings} {Archives} {And} {Evaluation

Laure-Oceane Bodolec and Guillaume Caumon and Christine Fay-Varnier and Jairo Cugliari. ( 2025 )

in: 2025 {RING} meeting, pages 100--114, ASGA

Abstract

Natural Language Processing (NLP) tools have proven valuable for information retrieval and linkage. Large Language Models (LLMs), trained on extensive datasets, have been recently widely deployed for their reasoning and analysis abilities across diverse domains. However, they also suffer from their generality for handling small specialized corpus of documents (e.g. disciplinary, proprietary or both). We specialize a local quantized LLM (Mistral-7B-Instruct-v0.2-GPTQ) in geological modeling using Retrieval-Augmented Generation (RAG) on the 35 years of RING GOCAD Meeting proceedings archive. This approach aims to enhance domain-specific knowledge integration, reducing the model hallucinations for searching and analyzing literature, while minimizing the computational costs. Evaluating this model requires adressing the limitations of published open-domain and NLP-focused benchmarks and metrics, as well as time and computational resources. To ensure effective quality assessment, we define a methodology for evaluating local RAG application in specialized and confidential fields, consisting in quantitative and qualitative performance assessment on a benchmark. Quantitative evaluation shows 60\% LLM accuracy across 5 geomodelling-related benchmark themes, and 30 points degradation of the RAG accuracy compared to the language model alone. The RAG underperformance leads to several hypotheses on the system behavior (RAG biaises inheritance from the LLM, semantic biases in document retrieval creating a distracting context for the LLM); and then to system improvement directions (e.g. context lenght reduction, model configuration adaptation).

Download / Links

BibTeX Reference

@inproceedings{Bodolec2025RM,
 abstract = {Natural Language Processing (NLP) tools have proven valuable for information retrieval and linkage. Large Language Models (LLMs), trained on extensive datasets, have been recently widely deployed for their reasoning and analysis abilities across diverse domains. However, they also suffer from their generality for handling small specialized corpus of documents (e.g. disciplinary, proprietary or both). We specialize a local quantized LLM (Mistral-7B-Instruct-v0.2-GPTQ) in geological modeling using Retrieval-Augmented Generation (RAG) on the 35 years of RING GOCAD Meeting proceedings archive. This approach aims to enhance domain-specific knowledge integration, reducing the model hallucinations for searching and analyzing literature, while minimizing the computational costs. Evaluating this model requires adressing the limitations of published open-domain and NLP-focused benchmarks and metrics, as well as time and computational resources. To ensure effective quality assessment, we define a methodology for evaluating local RAG application in specialized and confidential fields, consisting in quantitative and qualitative performance assessment on a benchmark. Quantitative evaluation shows 60\% LLM accuracy across 5 geomodelling-related benchmark themes, and 30 points degradation of the RAG accuracy compared to the language model alone. The RAG underperformance leads to several hypotheses on the system behavior (RAG biaises inheritance from the LLM, semantic biases in document retrieval creating a distracting context for the LLM); and then to system improvement directions (e.g. context lenght reduction, model configuration adaptation).},
 author = {Bodolec, Laure-Oceane and Caumon, Guillaume and Fay-Varnier, Christine and Cugliari, Jairo},
 booktitle = {2025 {RING} meeting},
 language = {en},
 pages = {100--114},
 publisher = {ASGA},
 title = {{RAG} {Application} {On} {The} {RING}-{GOCAD} {Meeting} {Proceedings} {Archives} {And} {Evaluation}},
 year = {2025}
}

RAG} {Application} {On} {The} {RING}-{GOCAD} {Meeting} {Proceedings} {Archives} {And} {Evaluation

Abstract

Download / Links

BibTeX Reference

QuickLinks for Sponsors

Proceedings Archives

2025 RING meeting

2024 RING meeting

2023 RING meeting

2022 RING meeting

2021 RING meeting

2020 RING meeting

2019 RING meeting

2018 RING meeting

2017 RING meeting

2016 RING meeting

2015 RING meeting

34th (2014) gOcad meeting

33rd (2013) gOcad meeting

32nd (2012) gOcad meeting

31st (2011) gOcad meeting

30th (2010) gOcad meeting

29th (2009) Spring gOcad meeting

2009 Fall gOcad meeting

[1989-2008] gOcad Archive