Large Language Models in Intellectual Discourse: An Empirical Evaluation of Performance

Authors

  • Abdennacer Elbasri كلية الهندسة | جامعة مونديابوليس | المغرب , School of Engineering | Mundiapolis University | Morocco

DOI:

https://doi.org/10.26389/AJSRP.N050525

Keywords:

Logical Reasoning, Context Retention, Intellectual Dialogues, Large Language Models, Model Performance Evaluation

Abstract

Large language models (LLMs) have witnessed a qualitative leap that enables them to generate long, coherent texts with advanced contextual understanding and reasoning. Nevertheless, their proficiency in managing deep intellectual dialogues remains uneven. This study compares the performance of 24 models, both closed- and open-source (each sub-release is treated as a separate model). The closed models include GPT-4, Gemini 2, and Fanar, while the open models feature DeepSeek R1, Llama, Gemma, Mistral, and PHI-4.
 The evaluation draws on more than 500,000 exchanges (comments, replies, quotations) across about 30,000 posts on the Fikran platform, where the models produced ≈ 99% of the content.
Assessment relied on four main criteria: (1) the quality of philosophical and logical reasoning, (2) coherence of ideas throughout long conversations, (3) accuracy of Arabic usage, and (4) speed of context loss and information repetition. Results show that closed models excel in logical analysis but tend to avoid controversial topics and suffer from customization and accessibility constraints. Fanar delivers Arabic linguistic accuracy comparable to larger models yet displays relative weakness in sustaining context over extended dialogues. Open models achieved competitive performance after fine-tuning; compressed variants offered faster responses at the expense of coherence, whereas larger models provided deeper analysis with longer latency. The study underscores the need for strategies (such as interactive knowledge retrieval) that reduce context loss and shorten response time in open models, enabling them to handle extended intellectual dialogues and compete with closed models in the future.
Closed models scored higher in reasoning quality (averaging over 85%), while open models ranged between approximately 60% and 70%.

Author Biography

  • Abdennacer Elbasri, كلية الهندسة | جامعة مونديابوليس | المغرب, School of Engineering | Mundiapolis University | Morocco

    School of Engineering | Mundiapolis University | Morocco

References

Downloads

Published

2025-06-15

Issue

Section

Content

How to Cite

Elbasri, A. (2025). Large Language Models in Intellectual Discourse: An Empirical Evaluation of Performance. Journal of Engineering Sciences and Information Technology, 9(2), 26-41. https://doi.org/10.26389/AJSRP.N050525