Application of Distributional Semantics in the Qur'an, A Case Study of the Root-word "Farah"

Document Type : مقالات علمی پژوهشی

Authors
1 Ph.D. Student in Sciences of Qur’an and Hadith, Tarbiat Modares University, Tehran, Iran
2 Associate Professor, Department of Sciences of Qur’an and Hadith, Tarbiat Modares University, Tehran, Iran
Abstract
Distributional semantics is a neoconstructionist method that focuses on the contextual use of words in real texts to determine the approximate meaning of a word relative to other words. Since one of the goals of applying semantics to the Qur'an is to ascertain the meaning of words according to their practical context, the use of this method in Qur'anic studies becomes important.In this research, in order to introduce the application of distributional semantics in the Qur'an, the descriptive-analytical method is employed to explain the implementation steps, the challenges, and the solutions to overcome them, by examining a case study.The steps are: determining comparable words for the main target word; determining distributional features (surah, equivalent to document, and phrase); linguistic pre-processing of the Qur'anic text and removal of stopwords; forming the context vector and co-occurrence matrix and weighting its elements; and finding and analyzing the semantic similarity of words.The most important challenges of using this method in the Qur'an are the small volume of the Qur'anic text, the lack of suitable software for calculations within the Qur'anic context, and the great difference in the length of surahs in the word-document pattern.Solutions to overcome some of these challenges include: paying closer attention to the basis of the distributional method (distributional hypothesis); avoiding the use of this method for very low-frequency words; and comparing the results obtained from the word-document and word-phrase patterns.For the root-word Farah, it is found that the meaning derived from the Qur'anic context (closer to the meaning of pride) is different from the meaning mentioned in standard dictionaries (joy).
 
1. Introduction
Distributional semantics, as one of the neo-constructural methods in semantics, identifies the meaning of words by examining their use in natural texts.This method is based on the distributional hypothesis, which states: ‘Word with similar distributional properties have similar meanings.Applying this method to the Quranic corpus is helpful in understanding the meaning of Quranic words by examining their usage context, especially in the case of words that are disputed by philologists. Thus, by comparing different words from the perspective of their distribution in the Quran (including co-occurrence with other words or use in different Surahs), and finding their distributional similarity, it identifies the scope of the meaning of the disputed word by obtaining the semantic distance between the words. Applying the distributional method in the Quran, due to the characteristics of the Arabic language and the Quranic corpus, requires localizing the method and overcoming its possible challenges. The method of implementing the method in the Quran is demonstrated practically using the case example of the root word "Farah", the meaning of which is disputed.
Research Question(s)
This research includes two main questions about the method of implementing distributional semantics in the Quran:
1-         What are the practical steps for implementing distributional semantics in the Quran, considering its linguistic and textual properties?
2-         What are the challenges of applying distributional method in the Quran and how to overcome them?
After that, in the practical implementation of the method in the Quran, another question is also raised:
3- What does the use of distributional semantics reveal about the meaning of the root word ‘Farah’ in the Quran?
2. Literature Review
Many works have been written in the field of distributional semantics, but we have not found a work in Persian that fully introduces this method and specifically implements it in the Quran. However, the translator of the two books Machine learning for text (Aggarwal, 2018) and Networks (Newman, 2018) into Persian (Ayub Turkian), added the appendices "Practical Text Mining Training" (Turkian, 2019a) and "Text Network" (Turkian, 2019b), respectively, which also examined the semantical similarity of verses of the Holy Quran and the clustering of nouns in the Quran. These two cases can be considered the closest to the subject of this research.
 
3. Methodology
The present study can be divided into two main parts. In the first part, which includes the explanation of how to implement distributional semantics in the Quran and the examination of its challenges, the research method is descriptive-analytical and the data collection method is documentary. In the second part, which is the implementation of distributional semantics on a case study in the Quran, first a number of words are selected for comparison with the main word "farah", which, based on the views presented in dictionaries, are in the two semantic domains of 'joy' and 'arrogance'. Then, two models of the distributional method are implemented for comparing words. In the word-phrase model, considering a 15-word context window (n=15), a number of phrases are selected as features and words are compared with each other, based on how they co-occur. In the word-document model, surahs are considered as documents (distributional features), and the occurrence of words in surahs is compared. Then, the results of implementing these two patterns of the distribution method on words are compared to obtain the final result.
 
4. Results
Distributional semantics is applied to the Quranic corpus with the aim of identifying the approximate meaning, i.e., determining the semantic domain of the disputed words. The implementation of this method in the present study includes the following steps:
1.         Finding words to compare with the original word based on dictionaries;
2.         Determining the distributional features, including documents (Surahs) and phrases that co-occur with the word (by grammatical or non-grammatical relations);
3.         Linguistic pre-processing of words, including finding roots and effective forms of each root, disambiguation and merging words;
4.         Removing stopwords;
5.         Forming the co-occurrence matrix and its weighting (in the word-surah model with the TFIDF rule and in the word-phrase model with the PPMI rule);
6.         Calculating the similarity of words by the cosine of the angle between the weighted context vectors;
7.         Interpretation of the similarity between words using dictionaries and semantic relations between sentences.
       The challenges of applying distributional semantics, especially in the Quran, and some ways to overcome them include:
1.         General challenges of the distributional semantics, including the problem of automatic language preprocessing, interpretation of word similarity, and speaker intention recognition.
2.         The challenges of applying distributional semantics to the Quran including the small size of the Quranic corpus for statistical calculations, the lack of appropriate software for effective rooting, disambiguation, and recognizing stopwords, and the large variation in the length of surahs in the word-surah model. Some solutions to overcome the challenges include: paying more attention to the distributional hypothesis as the basis of the method, which, based on the wisdom of the speaker, results in greater credibility in the Quran; and correcting the error caused by the difference in the length of the documents in the word-surah model by comparing the results with the word-phrase model.
       Applying the distributional method to the root-word Farah in the Quran shows the difference in the meaning obtained from the Quranic corpus (in the semantic domain of arrogance) compared to the dominant meaning mentioned in dictionaries, especially modern dictionaries (joy); that represents the semantic evolution of this word.

Keywords

Subjects


·       آذرنوش، آ. (1383).  فرهنگ معاصر عربی ـ  فارسی. تهران: نی.
·       ابن درید، م. (1988). جمهرة اللغة. بیروت: دار العلم للملایین‏.
·       ابن قتیبه، ع. (1411ق). تفسیر غریب القرآن. بیروت: دار و مکتبة الهلال.
·       ابوعبیده، م. (1381ق). مجاز القرآن. قاهره: مکتبة الخانجی.
·       ازهرى، م. (1421ق). تهذیب اللغة. بیروت: دار احیاء التراث العربی.
·       افراشی، آ. (1395). مبانی معناشناسی شناختی. تهران: پژوهشگاه علوم اسلامی و مطالعات فرهنگی.
·       ایزوتسو، ت. (1966). مفاهیم اخلاقی ـ  دینی در قرآن مجید. ترجمۀ ف. بدره‌ای (1378). تهران: فرزان.
·       ترکیان، ا. (1398الف). آموزش عملی متن کاوی، پیوست الفِ متن کاوی یادگیری ماشین. چ. آگاروال، ترجمة ا. ترکیان (صص 617 ـ  659). چاپ دوم، تهران: نیاز دانش.
·       ترکیان، ا. (1398ب). شبکة متن، پیوستِ شبکه‌ها. م. نیومن، ترجمة ا. ترکیان (صص 433 ـ 450). تهران: نیاز دانش.
·       جوهرى، ا. (1376‏ق). الصحاح (تاج اللغة و صحاح العربیة). بیروت‏: دارالعلم للملایین.
·       ذوعلم، آ. (1397). معناشناسی فرح در قرآن و روایات. پایان‌نامة کارشناسی ارشد. دانشکدة الهیات دانشگاه تهران.
·       ذوعلم، آ.، فائز، ق.، و خوش‌منش، ا. (1399). معناشناسی فرح در قرآن کریم؛ پاسخی به شبهة نکوهش شادی در قرآن. تحقیقات علوم قرآن و حدیث، 47 (3)، 1 ـ  34.
·       راغب اصفهانى، ح. (1412‏‏ق). مفردات ألفاظ القرآن‏. بیروت: دارالقلم‏.
·       صفوی، ک. (1399). درآمدی بر معنی‌شناسی. چاپ ششم، تهران: پژوهشکدۀ فرهنگ و هنر اسلامی.
·       طبرسى، ف. (1372). مجمع البیان فی تفسیر القرآن. تهران: ناصر خسرو.
·       عبد الباقی، م. (1393). المعجم المفهرس لالفاظ القرآن الکریم. قم: نوید اسلام.
·       عسکرى، ح. (1400ق). الفروق فی اللغة. بیروت: دار الافاق الجدیدة.
·       فائز، ق.، خوش‌منش، ا.، و ذوعلم، آ. (1398). بهره‌گیری از سیاق متنی در معناشناسی ساختگرای قرآن کریم. پژوهش‌های قرآن و حدیث، 52(1)، 95 ـ  115.
·       فراهیدى، خ. (1409ق).  کتاب العین‏. قم‏: هجرت.
·       فرهنگ‌نامۀ علوم قرآنی، دفتر تبلیغات اسلامی، قم: پژوهشگاه علوم و فرهنگ اسلامی، 1394.
·       کرافورد، و.، و سزومی، ا. (2016). زبان‌شناسی پیکره‌ای در عمل. ترجمة م. نوبخت (1399). تهران: سمت.
·       گلی ملک‌آبادی، ف.، خاقانی اصفهانی، م.، و شکرانی، ر. (1396). نقدی معناشناختی بر ترجمه‌های فارسی واژة «دون» در قرآن کریم. جستارهای زبانی، 36 (1)، 207 ـ  230.
·       گیررتس، د. (2010). نظریه‌های معنی‌شناسی واژگانی. ترجمة ک. صفوی (1398). تهران: نشر علمی.
·       لسانی فشارکی، م.، و مرادی زنجانی، ح. (1395). سوره‌شناسی. چاپ دوم. قم: نصایح.
 
·       Afrashi. A. (2016). Introdusing Cognitive Semantics. Institute for Humanities and Cultural Studies [In Persian].
·       Askari. H. (1979). Alfuroogh Fillugha. Dar Al’afagh Aljadida [In Arabic].
·       Azarnoosh, A. (2004). A Dictionary of Modern Written Arabic. Nei [In Persian].
·       Azhari. (2000). Tahzib Allugha. Dar Al’ihya Altorath Al’arabiy [In Arabic].
·       Abu ـ ubaida. M. (1963). Majaz alQuran. Maktaba Alkhanaji [In Arabic].
·       Boleda, G. (2020). Distributional Semantics and Linguistic Theory. Annual Review of Linguistics, 6, 213234.
·       Crawford. W., Csomay. E. (2016). Doing Corpus Linguistics, Samt [In Persian].
·       Dictionary of Quranic Sciences. (2015). Islamic Sciences and Culture Academy [In Persian].
·       Dumais, S., Furnas, G., Landauer, T. (1988). Using latent semantic analysis to improve access to textual information, In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, New Yorkpp. 281 -285).
·       Faez. Gh., Khoshmanesh. A., Zouelm. A. (2019). The Use of Textual Context in Structural Semantics of the Holy Qur’an, Quranic Researches and Tradition, 52(1), 95 ـ 115 [In Persian].
·       Farahidi. Kh. (1988). Ketab Al’Ain. Hejrat [In Arabic].
·       Geeraerts. D. (2010). Theories of Lexical Semantics, Nashre Elmi [In Persian].
·       Goli. F, Khaqani. M, Shokrani R. (2017). A Semantic Criticism of the Persian Translation of the Arabic Term “Doon” in the Holy Quran, Language Relate Research, 8 (1), 207 ـ 230 [In Persian].
·       Harris, Z. (1954). Distributional Structure. WORD, 10(2 ـ 3), 146 ـ 162.
·       Ibn Duraid. M. (1988). Jumhora allugha. Dar Al’ilm Almullain [In Arabic].
·       Ibn Qutaiba. ‘A. (1990). Tafsir Qarib alQuran. Dar va Maktaba Alhilal [In Arabic].
·       Jawhari. I. (1956). Alsehah. Dar Al’ilm Almullain [In Arabic].
·       Izutsu. T. (1966). The structure of meaning in ethico ـ religious concepts in Quran. Farzan [In Persian].
·       Lenci, A. (2018). Distributional Models of Word Meaning. Annual Review of Linguistics, 4, 151–171.
·       Lesani Fesharaki. M., Moradi. Zanjani. H. (2016). Surah Shenasi. Nasayeh [In Persian].
·       Raghib Isfahani. H. (1991). Mufradat Alfaz Alquran. Dar Alghalam [In Arabic].
·       Safavi. K. (2020). An Introduction to Semantics. Sourey ـ e ـ Mehr [In Persian].
·       Sahlgren, M. (2006) The Word ـ Space Model, Doctoral Dissertation, Stockholm University, Sweden.
·       Schutze, H. (1992). dimensions of meaning, In ACM/IEEE conference on Supercomputing, (pp. 787 ـ 796).
·       Schutze, H. (1993). Word space. In Proceedings of the 1993 Conference on Advances in Neural Information Processing Systems, NIPS’93 (pp. 895–902).
·       Tabrasi. F. (1993). Majma’ Albayan fi Tafsir Alquran. Naserkhosro [In Arabic].
·       Turkian, A. (2019a). Learning Practical text mining, Machine learning for text. Aggarwal. C. (pp. 617659), Niaze Danesh [In Persian].
·       Turkian, A. (2019b). Text Network. Networks. Newman.M. (pp. 433450), Niaze Danesh [In Persian].
·       Turney, P. (2010). From Frequency to Meaning: Vector Space Models of Semantics, Journal of Artificial Intelligence Research, 37, 141 ـ 188.
·       Zouelm. A. (2018). Semantical Study of Farah in Quran and Tradition. Master thesis, Faculty of Theology and Islamic Studies, Tehran University [In Persian].
·       Zouelm. A., Faez. Gh., Khoshmanesh. A. (2020). Semantical Study of Farah in the holy Qur'an, A Response to the Doubt of Disapproval of Joy in the Qur'an. Journal of Researches of Quran and Hadith Sciences. 17(3), 1 ـ 34 [In Persian].