Investigating the Correctness of Attributing the MunajatKhams 'Ashar to Imam Sajjad (PBUH) Based on Stylometry Techniques

Document Type : مقالات علمی پژوهشی

Authors
1 Professor, Department of Arabic Language and Literature, Tarbiat Modares University, Tehran, Iran
2 PhD student, Arabic Language and Literature, Kharazmi University, Tehran, Iran.
Abstract
Advances in science and technology have made it no longer acceptable to have works with a dubious author. Stylometry is a method that uses statistical analysis to determine the author of a literary work. Author attribution methods rely heavily on writing style; assuming that each person has unique style. Author identification is used in areas such as plagiarism, criminology, and unspecified author identification. Due to the fact that many factors are involved in identifying the author of texts, a method with 100% accuracy has not been presented so far, and researchers are still trying to find a way to minimize computational errors. One of the methods that is claimed to have good accuracy is Yule’s theory. In this article, Yule's theory and four other theories have been combined to compare the vocabulary richness of the Munajat Khams 'Ashar and the prays of Al-Sahifa al-Sajjadiyya. Then, Using descriptive-analytical method and explanation of statistical datas, the correctness of the attribution of Munajat Khams 'Ashar to Imam Sajjad (PBUH) has been investigated. The results show the high accuracy of the calculations and the independence of the output of the theories to the length of the text. Also, due to the slight difference between the vocabulary richness of the Munajat Khams 'Ashar and the prays of Al-Sahifa al-Sajjadiyya, its attribution to Imam Sajjad (PBUH) is confirmed.



1. Introduction

The issue of attributing a text to someone who did not really write it, has always been the focus of researchers. With the advancement of science in the twentieth century, the need to prove the accuracy of attributing a text to a particular author has intensified, and with the advancement of information technology, the popularity of intelligent methods of author recognition has increased. Today, to identify the author of a text, various methods are used, one of the most important methods is study the writing style.

The study of writing style is a subset of the new rhetoric. The new rhetoric aims at adding formal logic a field of reasoning, and applies whenever action is linked to rationality (Perelman, 1971). In stylistics, using text reasoning and analysis, characteristics are considered for the author's style.

A variety of methods for attribution have been proposed. There are three main approaches: lexical methods, syntactic or grammatic methods, and language-model methods, including methods based on compression (Zhao & Zobel, 2005). In this article, the lexical method will be used. One of the most practical lexical methods to achieve the author's style is the "vocabulary richness" method. Unfortunately, the output of many methods depends on the length of the text. Therefore, a method should be used that has the least dependence on the length of the text. In this paper, we have combined five theories to calculate vocabulary richness to achieve the most accurate results.



Research Question(s)

1. How accurate and reliable are the results of the five equations used in this research?

2. How much does the output of the theories depend on the length of the text?

3. What is the difference between the vocabulary richness of Munajat Khams 'Ashar and the prays of Al-Sahifa al-Sajjadiyya?

2. Literature Review

Authorship attribution (AA) is the process of attempting to identify the likely authorship of a given document, given a collection of documents whose authorship is known (Bozkurt et al., 2007). The accepted assumption behind AA is that every author writes in a distinct way; some writing characteristics cannot be manipulated by the writer’s will, and therefore can be identified by an automated process (Howedi & Mohd, 2014).

One of the fundamental sub-problems of AA is the extraction of the most suitable features to represent the writing style of each author. This problem is known as “stylometry” (Howedi et al., 2020, p. 1334). stylometry is defined as those techniques that allow measure the style of an author by the identification of its features of style (stylemas). Those stylemas, also called style markers, are obtained from textual measurements normally calculated by statistical methods (Escobedo et al., 2013, Stamatatos, 2009).

Some researchers have used a combination of some lexical richness functions to achieve better results, namely: K proposed by Yule (1944), R proposed by Honore (1979), W proposed by Brunet (1978), S proposed by Sichel (1975), and D proposed by Simpson (1949) which are defined as follows (Stamatatos et al., 2000):











where:

Vi : is the number of words used exactly i times

N: Total number of words

V: Number of non-repetitive words

α: usually is fixed at 0.17

The final output for calculating vocabulary richness is obtained by combining these five equations.

Since the series of narrators and the document of Munajat Khams 'Ashar is not mentioned completely in the available sources, attributing it to Imam Sajjad (PBUH) needs to be proved, so in this research, using stylometry techniques, it is examined.



3. Methodology

In the present article, the correctness of attributing Munajat Khams 'Ashar to Imam Sajjad (PBUH) is examined by sampling the prays of Al-Sahifa al-Sajjadiyya and comparing his vocabulary richness with the Munajat Khams 'Ashar. Since, according to the claim, the output of the theories is not dependent on the length of the text, two statistical populations are selected: the first consists of prays which 80 words have been selected, and the second consists of prayers With different number of words; Therefore, in addition to comparing the vocabulary richness of the samples, the dependence of the equations on the length of the text will also be examined. Also, From Munajat Khams 'Ashar, we chose the first, fifth, tenth and fifteenth prays as samples.



4. Results

The results show that:

1. The accuracy of the calculations is very high and therefore the output of the theories is reliable.

2. The output of the theories was not dependent on the length of the text and did not increase in proportion to the increase in the number of words.

3. There is not much difference between the vocabulary richness of Munajat Khams 'Ashar and the prays of Al-Sahifa al-Sajjadiyya in both statistical populations; Therefore, the correctness of attributing the Munajat Khams 'Ashar to Imam Sajjad (PBUH) - from the perspective of stylometry techniques - is proved.




Keywords

Subjects


• آقابزرگ الطهرانی، م. (1403ق). الذریعة إلى تصانیف الشیعة. بیروت: دار الأضواء.
• أمیدوار، أ. و أمید علی، أ. ( 2015 م). دراسة أسلوبیة فی صحة نسبة الدیوان المنسوب إلى الإمام علی(ع) على أساس معادلة یول. اللغة العربیة وآدابها، 11(1)، 59-81.
• بلیث، ه. (1989م). البلاغة والأسلوبیة. ترجمة وتقدیم وتعلیق: محمد العمری. ط1. منشورات دراسات أساس فاس.
• چراغی‌وش، ح.، خسروی، ک. و لطفی، ا. (1392 ش). سبک‌شناسی مناجات التائبین امام سجاد (ع) بر مبنای رویکرد ساختارگرایی. لسان مبین، 4(11)، 82-101.
• الحر العاملی، م. (1379 ش). الصحیفة السجادیة الثانیة. تحقیق: فارس حسون کریم. قم: مؤسسة المعارف الإسلامیة.
• زنگویی، س. و نعمتی شمس آباد، ح. (1392ش). شناسایی نویسندگان پیام‌های الکترونیکی از طریق واکاوی نوع و سبک نگارش آنها مبتنی بر روش‌های یادگیری ماشین. پردازش و مدیریت اطلاعات، 29(2)، 453-476.
• الصدر، ح. (1354ش). نهایة الدرایة. تحقیق: ماجد الغرباوی. تهران: نشر مشعر.
• الصدر، ح. (1375ش). تأسیس الشیعة لعلوم الإسلام. بی جا: اعلمی.
• العاملی، ح. (1360 ش). وصول الأخیار إلى اصول الأخبار. تحقیق: عبداللطیف الکوهکمری. قم: مجمع الذخائر الإسلامیة.
• فرهمندپور، ز.، نیک‌مهر، ه.، منصوری‌زاده، م. و طبیب‌زاده قمصری، ا. (1391ش). یک سیستم نوین هوشمند تشخیص هویت نویسندة فارسی زبان بر اساس سبک نوشتاری. محاسبات نرم، شمارة 2. 26-35.
• قمی، ع. (1379 ش). مفاتیح الجنان. تهران: برهان.
• لطفی، ا. (1391ش). سبک‌شناسی مناجات خمس عشر امام سجاد (ع). پایان‌نامة کارشناسی ارشد. استاد راهنما: حسین چراغی وش. دانشگاه لرستان.
• متقی‌زاده، ع.، حاجی‌خانی، ع. و مدیری، س. (1401ش). تفاوت سبک نامه‌های نهج‌ البلاغه از منظر "غنای واژگانی" و "رویکرد انفعالی" (مطالعة موردی نامه‌های 23، 30 و 73). جستارهای زبانی، 13(1)، 235-262.
• مجلسی، م. (1440ق). بحار الأنوار الجامعة لدرر أخبار الأئمة الأطهار. بیروت: دار إحیاء التراث العربی.
• محقق داماد، م. (1406 ق). قواعد فقه. چ 12. تهران: مرکز نشر علوم اسلامی.
• محمدی، ع. (1387 ش). شرح اصول فقه. چ 10. قم: دار الفکر.
• مدیری، س. (1399 ش). المقارنة بین نهج البلاغة والصحیفة السجادیة على أساس الأسلوبیة الإحصائیة. رسالة أعدت لنیل شهادة الماجستیر. الأستاذ المشرف: عیسى متقی زادة. جامعة تربیت مدرس.
• مرادی، م. و بحرانی، م. (1394ش). تشخیص خودکار جنسیت نویسنده در متون فارسی. پردازش علائم و داده‌ها. شمارة 4، 83-94.
• مسجدی، ه.، عادل، م.، امیریان، م. و زارعیان، غ. (1400 ش). نگرشی به "متن‌کاوی" در پژوهش‌های زبانی: رویکرد رایانشی در تحلیل متون. جستارهای زبانی، 12(6)، 499-531.
• مشهدی نوش‌آبادی، م. (1394ش). نخستین تبیین احوال و مقامات عرفانی در مناجات خمس عشرة. مطالعات عرفانی، شمارة 22، 215-240.
• مصلوح، س. (1992م). الأسلوب، دراسة لغویة إحصائیة. ط3. القاهرة: عالم الکتب.
• المنتظری، ح. (1416ق). البدر الزاهر فی صلاة الجمعة والمسافر. قم: مکتب آیة الله العظمی المنتظری.
• النجفی الجواهری، م. (1362ش). جواهر الکلام فی شرح شرائع الإسلام. تحقیق: عباس قوچانی. ط7. بیروت: دار إحیاء التراث العربی.
• وطن‌خواه، ر.، خاتمی، ا. و غلامرضایی، م. (1400 ش). تبیین رویکرد نظری سبک‌شناختی ملک الشعرای بهار. جستارهای زبانی، 12(2)، 421-477.
• یاحقی، م. و ایزانلو، ع. (1385ش). سبک‌سنجی، نقد و بررسی شیوة آماری کیوسام در انتساب یک اثر. مجلة دانشکدة ادبیات و علوم انسانی دانشگاه خوارزمی، س 14، ش 53، 151-190.
References
• Agha Bozorg Tehrani, M. (1982). Az-Zaree'a. Dar Al-Azvae. [In Arabic]
• Ameli, H. (1982.). Good people access to the assets of the news. Investigated by: Abdul Latif Al-Kokhamari. Islamic Relics Complex. [In Arabic]
• Al-Montazeri, H. (1995). The full moon in the Friday prayer and the traveler. Qom: The Office of the Great Ayatollah al-Montazeri. [In Arabic]
• Al-Sadr, H. (1996). Shi'a founding the sciences of Islam. [In Arabic]
• Bagirzade, Z.M. (2019). Rhetoric, Linguistics and Stylistics. Russian Linguistic Bulletin, 3(19). 41-43.
• Biber, D. (1995), Dimensions of Register Variation: A Cross-Linguistic Compariso, Cambridge University Press.
• Bleeth, H. (1989). Rhetoric and style. Translation: Mohammad Al-Omari. Publications on the principles of Fas. [In Arabic]
• Bozkurt, I.N.; Baghioglu, O. & Uyar, E. (2007). Authorship Attribution, performance of various features and classification methods. Conference: Computer and information sciences. 1-5.
• Brunet, E. (1978). Vocabulaire de Jean Giraudoux: Structure et Evolution. Slatkine.
• Cheraghivash, H.; Khosravy, K & Lotfy, E. (2014). Stylistics of Imam Sajjad's Monajat Altaebeen in the Light of Structuralism. Lisan-I Mubin. 4(11). 82-101. [In Persian]
• Escobedo, F; Cruz, C; Sierra, G. & Soto, J. (2013). Analysis of Stylometric Variables in Long and Short Texts. Procedia-Social and Behavioral Sciences 95. 604-611.
• Farahmandpoor, Z.; Nikmehr, H.; Mansoorizade, M. & abibzadeh Ghamsary, O. (2013). A Novel Intelligent Persian Authorship System based on Writing Style. Soft Computing Journal .1(2). 26-35. [In Persian]
• Fowler, R. (1987). A Dictionary of Modern Critical Terms. London and New York: Routledge and Kegan Paul Ltd. second edition.
• Gharechapogh, F.; Motamanfar, M. & Vafadar, M. (2017). A new way to identify the author of texts by combining particle mass optimization algorithms and backup vector machines. Computing Science Journal (CSJ). Num (12). 2-12. [In Persian]
• Honore, A. (1979). Some Simple Measures of Richness of Vocabulary. Association for Literary and Linguistic Computing Bulletin. 7(2). 172-177.
• Howedi, F.; Mohd, M.; Aborawi, Z & A.Jowan S. (2020). Authorship Attribution of Short Historical Arabic Texts using Stylometric Features and a KNN Classifier with Limited Training Data. Jornal of Computer Science. 16(10). 1334-1345.
• Howedi, F. & Mohd, M. (2014). Text Classification for Authorship Attribution Using Naïve Bayes Classifier with Limited Training Data. Computer engineering and intelligent systems. 5(4). 48-56.
• Al-Hur Al-Amili, M. (2000.). The second Sahifah Sajjadiah. Investigation: Faris Hassoun Karim. Islamic Knowledge Foundation. [In Arabic]
• Lotfi, A. (2010). Stylistics of the prays of Khams Ashar Imam Sajjad. Master Thesis. University of Lorestan. [In Persian]
• Majlesi, M. (2019). Behar Al-Anwar. Beirut: Arab Heritage Revival House. [In Arabic]
• Mashhadi nushabadi, M. (2015). The First Explanation of Mystical Ahwāl and Maqāmāt in Khamsa ‘Ashara Monājāt. Mysticism Studies. 1(22). 215-240. [In Persian]
• Masjedy, H.; Adel, M.; Amirian, M. & Zareian, G. (022). An Overview of Text Mining in Language Studies: The Computational Approach to Text Analytics. Language related research. 12 (6). 499-531. [In Persian]
• Maslouh, S. (1992). Style, Statistical linguistic study. Cairo: The world of the books. [In Arabic].
• Mendelhal, T.C. (1887). The characteristic curves of composition. Science. 9(214). 237-246.
• Modiri, S. (2018). Comparison between Nahj al-Balagha and Al-Sahifa al-Sajjadiyya on the basis of statistical stylistics. A thesis prepared to obtain a master's degree. Professor Supervisor: Isa MotaghiZadeh. Tarbiat Modares University. [In Arabic]
• Mohaghegh damad, M. (1985). Rules of jurisprudence. Islamic Sciences Publishing Center. [In Persian]
• Mohammadi, A. (2008). Description of the principles of jurisprudence. Dar Al-fekr. [In Persian]
• Moradi, M. & Bahrani, M. (2013). Automatic gender identification in persian text. Signal and Data Processing. Num(4). 83-94. [In Persian]
• Motaghizadeh, I; Hajikhani, A & Modiri, S. (2022). Differences between the style of the Nahj al Balagha's letters in terms of “Vocabulary Richness” and “literary style” (Letters 23, 30 and 73 as case study). Language related research. 13(1). 235-262. [In Persian]
• Najafi Al-Jawaheri, M. (1983). The jewels of speech in explaining the laws of Islam. Investigated by: Abbas Qoshani. Beirut: Arab Heritage Revival House. [In Arabic]
• Oliver Jr., W.; Justino, E. & Olivera, L.S. (2013). Comparing compression models for authorship attribution. Forensic Science International. 100-104.
• Omidvar, A. & Omidali, A. (2015). Stylistics research on correctness of relation of the poem related to Imam Ali (PBUH) based on Yule’s equation. Language and Literature Arabic. 11(1). 81-59. [In Arabic]
• Perelman, Ch. (1971). The new Rhetoric. Holland: Reidel publishing company.
• Qomi, A. (2000). Heaven Keys. Borhan. [In Persian]
• Rabab’ah, A.; Al-Ayyoub, M; Jararweh, Y. & Aldwairi, M. (2016). Authorship Attribution of Arabic Tweets. 13th International Conference og Computer Systems and Applications. Agadir, Morocco. 1-6.
• Sadr, H. (1975). End of know-how. Investigated by: Majed Al-Gharabawi. Hairy Publishing. [In Arabic]
• Sichel, H. (1975). On a Distribution Law for Word Frequencies. Journal of the American Statistical Association. No(70). 542-547.
• Simpson, E. (1949). Measurement of Diversity. Nature, 163:688.
• Stamatatos, E. (2009). A survey of modern authorship attribution methods. Journal of the American Society for Information Science and Technology. 60(3). 538-556.
• Tanaka-Ishii, K. & Aihara, Sh. (2015). Computational Constancy Measures of Textx-Yule’s K and Renyi’s Entropy. Computational Linguistics. 41(3). 481-502.
• Torruella, J. & Capsada, R. (2013). Lexical Statistics and Tipological Structures: A Measure of Lexical Richness. 5th International Conference on Corpus Linguistics. 447-454.
• Vatankhah, R.; Khatami, A. & Gholamrezaei, M. (2021). Explaining the Theoretical Approach of Stylistic Features of Malek osh-Sho'arā Bahar. Language related research. 12(2). 421-447. [In Persian]
• Wright, D. (2014). Stylistics versus Statistics: A corpus linguistic approach to combining techniques in forensic authorship analysis using Enron emails. Submitted in accordance with the requirements for the degree of Doctor of Philosophy. The University of Leeds. School of English.
• Yahaghi, M. & Izanlu, A. (2006). Stylometric, critique of Qiosam’s statistical method in assigning a work. Journal of the Faculty of Literature and Humanities. 14(53). 151-190. [In Persian]
• Yule, G. (1944). The Statistical Study of Literary Vocabulary. Cabbridge at the University Press, University Printing House, United Kingdom.
• Zangoei, S & Nemati Shamsabad, H. (2014). Identify the Authors of Electronic Messages Through the Analysis of the Type and Style Based on Machine Learning Technique. Information processing and management. 29(2). 453-476. [In Persian]
• Zhao, Y. & Zobel, J. (2005). Effective and Scalable Authorship Attribution Using Function Words. Information Retrieval Technology. 174-189.