我使用阿拉伯语wordNet和c#来获得像“عرض”这样的单数词的同义词
我得到以下同义词(علامة,أمارة,شدة,ضر,شؤم,بلية等)。
我的问题是:有没有办法从阿拉伯语WordNet获得复数词的同义词,如“علامات”。
我需要那个,因为我没有找到一种方法来用阿拉伯语中的复数形式来获得单数词,例如“علامات”=> “علامة。
感谢您提供的任何帮助。
答案 0 :(得分:1)
我通过编辑awn.xml文件并添加所有需要的复数词来解决这个问题,例如“عرض”这个词有复数“أعراض”,并且有同义词علامات,أمارات,شدائد,بلايا,أضرار作为以下内容
<wordnet version="20">
<item itemid=">aArad_n1AR" offset="102231120" lexfile="" name="أعراض" type="synset" headword="" POS="n" source="" gloss="" authorshipid="1" />
<authorship author="ali" date="20150215" score="" comment="" covering="1" authorshipid="1" />
<item itemid=">aMrad_n1AR" offset="102231121" lexfile="" name="أمراض" type="synset" headword="" POS="n" source="" gloss="" authorshipid="2" />
<authorship author="ali" date="20150215" score="" comment="" covering="1" authorshipid="1" />
<item itemid=">Isteqsa'at" offset="102231121" lexfile="" name="استقصاءات" type="synset" headword="" POS="n" source="" gloss="" authorshipid="3" />
<authorship author="ali" date="20150215" score="" comment="" covering="1" authorshipid="1" />
然后将同义词添加为以下
<authorship author="ali" date="20180215" score="" comment="From suggested word" covering="0" authorshipid="12136" />
<word wordid="<aArad_n1AR" value="أعراض" synsetid=">aArad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="أعراض" wordid="<aArad_n1AR" type="brokenPlural" authorshipid="12137" />
<word wordid="<$araat" value="إشارات" synsetid=">aArad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="أعراض" wordid="<$araat" type="brokenPlural" authorshipid="12137" />
<word wordid="<Alamat" value="علامات" synsetid=">aArad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="أعراض" wordid="<Alamat" type="brokenPlural" authorshipid="12137" />
<word wordid="<$adaed" value="شدائد" synsetid=">aArad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="أعراض" wordid="<$adaed" type="brokenPlural" authorshipid="12137" />
<word wordid="<adrar" value="أضرار" synsetid=">aArad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="أعراض" wordid="<adrar" type="brokenPlural" authorshipid="12137" />
<word wordid="<balaya" value="بلايا" synsetid=">aArad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="أعراض" wordid="<balaya" type="brokenPlural" authorshipid="12137" />
<word wordid="<tawar'a" value="طوارئ" synsetid=">aArad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="أعراض" wordid="<tawar'a" type="brokenPlural" authorshipid="12137" />
<word wordid="<fawajea" value="فواجع" synsetid=">aArad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="أعراض" wordid="<fawajea" type="brokenPlural" authorshipid="12137" />
<word wordid="<fawadeh" value="فوادح" synsetid=">aArad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="أعراض" wordid="<fawadeh" type="brokenPlural" authorshipid="12137" />
<word wordid="<kawareth" value="كوارث" synsetid=">aArad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="أعراض" wordid="<kawareth" type="brokenPlural" authorshipid="12137" />
<word wordid="<mehan" value="محن" synsetid=">aArad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="أعراض" wordid="<mehan" type="brokenPlural" authorshipid="12137" />
<word wordid="<makrohat" value="مكروهات" synsetid=">aArad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="أعراض" wordid="<makrohat" type="brokenPlural" authorshipid="12137" />
<word wordid="<masaeb" value="مصائب" synsetid=">aArad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="أعراض" wordid="<masaeb" type="brokenPlural" authorshipid="12137" />
<word wordid="<masawea" value="مساوئ" synsetid=">aArad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="أعراض" wordid="<masawea" type="brokenPlural" authorshipid="12137" />
<word wordid="<Elal" value="علل" synsetid=">aMrad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="أمراض" wordid="<Elal" type="brokenPlural" authorshipid="12137" />
<word wordid="<Ellat" value="علات" synsetid=">aMrad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="أمراض" wordid="<Ellat" type="brokenPlural" authorshipid="12137" />
<word wordid="<Eatilalat" value="اعتلالات" synsetid=">aMrad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="أمراض" wordid="<Eatilalat" type="brokenPlural" authorshipid="12137" />
<word wordid="<Da'aat" value="داءات" synsetid=">aMrad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="أمراض" wordid="<Da'aat" type="brokenPlural" authorshipid="12137" />
<word wordid="<waakat" value="وعكات" synsetid=">aMrad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="أمراض" wordid="<waakat" type="brokenPlural" authorshipid="12137" />
<word wordid="<askaam" value="أسقام" synsetid=">aMrad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="أمراض" wordid="<askaam" type="brokenPlural" authorshipid="12137" />
<word wordid="<$akawa" value="شكاوى" synsetid=">aMrad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="أمراض" wordid="<$akawa" type="brokenPlural" authorshipid="12137" />
<word wordid="<aMrad_n1AR" value="أمراض" synsetid=">aMrad_n1AR" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="أمراض" wordid="<aMrad_n1AR" type="brokenPlural" authorshipid="12137" />
<word wordid="<Fohosat" value="فحوصات" synsetid=">Isteqsa'at" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="استقصاءات" wordid="<Fohosat" type="brokenPlural" authorshipid="12137" />
<word wordid="<Taharieat" value="تحريات" synsetid=">Isteqsa'at" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="استقصاءات" wordid="<Taharieat" type="brokenPlural" authorshipid="12137" />
<word wordid="<Isteqsa'at" value="استقصاءات" synsetid=">Isteqsa'at" type="brokenPlural" frequency="" corpus="" authorshipid="12137" />
<form value="استقصاءات" wordid="<Isteqsa'at" type="brokenPlural" authorshipid="12137" />
现在我们执行以下代码片段
List<string> wordId = _awn.Get_List_Word_Id_From_Value("علامات");
List<string> synonyms = new List<string>();
if (wordId != null)
{
foreach (string ss in wordId)
{
string temp = _awn.Get_Synset_ID_From_Word_Id(ss);
List<string> test = _awn.Get_List_Word_Id_From_Synset_ID(temp);
if (test.Count != 0)
{
foreach (string str in test)
{
string s = _awn.Get_Word_Value_From_Word_Id(str);
if (!synonyms.Contains(s))
synonyms.Add(s);
}
}
}
}
我们在同义词列表“علل”,“علات”,“اعتلالات”,“داءات”,“وعكات”,“أسقام”,“شكاوى”中得到以下词语。这是“عرض”一词的同义词的复数词。
答案 1 :(得分:0)
如果你想从复数词中得到单数词,你可以使用任何可用的形态分析器,例如&#34; ALKhalil&#34;这是一个开源的java项目,但这只是为了获得复数而不是对比的单数。