带有复数字的阿拉伯语WordNet

时间:2018-02-11 21:40:15

标签: c# arabic wordnet

我使用阿拉伯语wordNet和c#来获得像“عرض”这样的单数词的同义词 我得到以下同义词(علامة,أمارة,شدة,ضر,شؤم,بلية等)。
我的问题是:有没有办法从阿拉伯语WordNet获得复数词的同义词,如“علامات”。 我需要那个,因为我没有找到一种方法来用阿拉伯语中的复数形式来获得单数词,例如“علامات”=> “علامة。
感谢您提供的任何帮助。

2 个答案:

答案 0 :(得分:1)

我通过编辑awn.xml文件并添加所有需要的复数词来解决这个问题,例如“عرض”这个词有复数“أعراض”,并且有同义词علامات,أمارات,شدائد,بلايا,أضرار作为以下内容

<wordnet version="20">
<item itemid="&gt;aArad_n1AR" offset="102231120" lexfile="" name="أعراض" type="synset" headword="" POS="n" source="" gloss="" authorshipid="1" />
<authorship author="ali" date="20150215" score="" comment="" covering="1" authorshipid="1" />
<item itemid="&gt;aMrad_n1AR" offset="102231121" lexfile="" name="أمراض" type="synset" headword="" POS="n" source="" gloss="" authorshipid="2" />
<authorship author="ali" date="20150215" score="" comment="" covering="1" authorshipid="1" />
<item itemid="&gt;Isteqsa'at" offset="102231121" lexfile="" name="استقصاءات" type="synset" headword="" POS="n" source="" gloss="" authorshipid="3" />
<authorship author="ali" date="20150215" score="" comment="" covering="1" authorshipid="1" />

然后将同义词添加为以下

<authorship author="ali" date="20180215" score="" comment="From suggested word" covering="0" authorshipid="12136" />
<word wordid="&lt;aArad_n1AR" value="أعراض" synsetid="&gt;aArad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="أعراض" wordid="&lt;aArad_n1AR" type="brokenPlural" authorshipid="12137" />
<word wordid="&lt;$araat" value="إشارات" synsetid="&gt;aArad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="أعراض" wordid="&lt;$araat" type="brokenPlural" authorshipid="12137" />
<word wordid="&lt;Alamat" value="علامات" synsetid="&gt;aArad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="أعراض" wordid="&lt;Alamat" type="brokenPlural" authorshipid="12137" />
<word wordid="&lt;$adaed" value="شدائد" synsetid="&gt;aArad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="أعراض" wordid="&lt;$adaed" type="brokenPlural" authorshipid="12137" />
<word wordid="&lt;adrar" value="أضرار" synsetid="&gt;aArad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="أعراض" wordid="&lt;adrar" type="brokenPlural" authorshipid="12137" />
<word wordid="&lt;balaya" value="بلايا" synsetid="&gt;aArad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="أعراض" wordid="&lt;balaya" type="brokenPlural" authorshipid="12137" />
<word wordid="&lt;tawar'a" value="طوارئ" synsetid="&gt;aArad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="أعراض" wordid="&lt;tawar'a" type="brokenPlural" authorshipid="12137" />
<word wordid="&lt;fawajea" value="فواجع" synsetid="&gt;aArad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="أعراض" wordid="&lt;fawajea" type="brokenPlural" authorshipid="12137" />
<word wordid="&lt;fawadeh" value="فوادح" synsetid="&gt;aArad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="أعراض" wordid="&lt;fawadeh" type="brokenPlural" authorshipid="12137" />
<word wordid="&lt;kawareth" value="كوارث" synsetid="&gt;aArad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="أعراض" wordid="&lt;kawareth" type="brokenPlural" authorshipid="12137" />
<word wordid="&lt;mehan" value="محن" synsetid="&gt;aArad_n1AR"  type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="أعراض" wordid="&lt;mehan" type="brokenPlural" authorshipid="12137" />
<word wordid="&lt;makrohat" value="مكروهات" synsetid="&gt;aArad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="أعراض" wordid="&lt;makrohat" type="brokenPlural" authorshipid="12137" />
<word wordid="&lt;masaeb" value="مصائب" synsetid="&gt;aArad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="أعراض" wordid="&lt;masaeb" type="brokenPlural" authorshipid="12137" />
<word wordid="&lt;masawea" value="مساوئ" synsetid="&gt;aArad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="أعراض" wordid="&lt;masawea" type="brokenPlural" authorshipid="12137" />
<word wordid="&lt;Elal" value="علل" synsetid="&gt;aMrad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="أمراض" wordid="&lt;Elal" type="brokenPlural" authorshipid="12137" />
<word wordid="&lt;Ellat" value="علات" synsetid="&gt;aMrad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="أمراض" wordid="&lt;Ellat" type="brokenPlural" authorshipid="12137" />
<word wordid="&lt;Eatilalat" value="اعتلالات" synsetid="&gt;aMrad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="أمراض" wordid="&lt;Eatilalat" type="brokenPlural" authorshipid="12137" />
<word wordid="&lt;Da'aat" value="داءات" synsetid="&gt;aMrad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="أمراض" wordid="&lt;Da'aat" type="brokenPlural" authorshipid="12137" />
<word wordid="&lt;waakat" value="وعكات" synsetid="&gt;aMrad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="أمراض" wordid="&lt;waakat" type="brokenPlural" authorshipid="12137" />
<word wordid="&lt;askaam" value="أسقام" synsetid="&gt;aMrad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="أمراض" wordid="&lt;askaam" type="brokenPlural" authorshipid="12137" />
<word wordid="&lt;$akawa" value="شكاوى" synsetid="&gt;aMrad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="أمراض" wordid="&lt;$akawa" type="brokenPlural" authorshipid="12137" />
<word wordid="&lt;aMrad_n1AR" value="أمراض" synsetid="&gt;aMrad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="أمراض" wordid="&lt;aMrad_n1AR" type="brokenPlural" authorshipid="12137" />
<word wordid="&lt;Fohosat" value="فحوصات" synsetid="&gt;Isteqsa'at" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="استقصاءات" wordid="&lt;Fohosat" type="brokenPlural" authorshipid="12137" />
<word wordid="&lt;Taharieat" value="تحريات" synsetid="&gt;Isteqsa'at" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="استقصاءات" wordid="&lt;Taharieat" type="brokenPlural" authorshipid="12137" />
<word wordid="&lt;Isteqsa'at" value="استقصاءات" synsetid="&gt;Isteqsa'at" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="استقصاءات" wordid="&lt;Isteqsa'at" type="brokenPlural" authorshipid="12137" />

现在我们执行以下代码片段

        List<string> wordId = _awn.Get_List_Word_Id_From_Value("علامات");
        List<string> synonyms = new List<string>();
        if (wordId != null)
        {
            foreach (string ss in wordId)
            {
                string temp = _awn.Get_Synset_ID_From_Word_Id(ss);
                List<string> test = _awn.Get_List_Word_Id_From_Synset_ID(temp);
                if (test.Count != 0)
                {
                    foreach (string str in test)
                    {
                        string s = _awn.Get_Word_Value_From_Word_Id(str);
                        if (!synonyms.Contains(s))
                            synonyms.Add(s);
                    }
                }
            }
        }

我们在同义词列表“علل”,“علات”,“اعتلالات”,“داءات”,“وعكات”,“أسقام”,“شكاوى”中得到以下词语。这是“عرض”一词的同义词的复数词。

答案 1 :(得分:0)

如果你想从复数词中得到单数词,你可以使用任何可用的形态分析器,例如&#34; ALKhalil&#34;这是一个开源的java项目,但这只是为了获得复数而不是对比的单数。