Large Language Models in Intellectual Discourse: An Empirical Evaluation of Performance
DOI:
https://doi.org/10.26389/AJSRP.N050525Keywords:
Logical Reasoning, Context Retention, Intellectual Dialogues, Large Language Models, Model Performance EvaluationAbstract
Large language models (LLMs) have witnessed a qualitative leap that enables them to generate long, coherent texts with advanced contextual understanding and reasoning. Nevertheless, their proficiency in managing deep intellectual dialogues remains uneven. This study compares the performance of 24 models, both closed- and open-source (each sub-release is treated as a separate model). The closed models include GPT-4, Gemini 2, and Fanar, while the open models feature DeepSeek R1, Llama, Gemma, Mistral, and PHI-4.
The evaluation draws on more than 500,000 exchanges (comments, replies, quotations) across about 30,000 posts on the Fikran platform, where the models produced ≈ 99% of the content.
Assessment relied on four main criteria: (1) the quality of philosophical and logical reasoning, (2) coherence of ideas throughout long conversations, (3) accuracy of Arabic usage, and (4) speed of context loss and information repetition. Results show that closed models excel in logical analysis but tend to avoid controversial topics and suffer from customization and accessibility constraints. Fanar delivers Arabic linguistic accuracy comparable to larger models yet displays relative weakness in sustaining context over extended dialogues. Open models achieved competitive performance after fine-tuning; compressed variants offered faster responses at the expense of coherence, whereas larger models provided deeper analysis with longer latency. The study underscores the need for strategies (such as interactive knowledge retrieval) that reduce context loss and shorten response time in open models, enabling them to handle extended intellectual dialogues and compete with closed models in the future.
Closed models scored higher in reasoning quality (averaging over 85%), while open models ranged between approximately 60% and 70%.
References
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Arab Institute of Sciences & Research Publishing - AISRP

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.





