我有关在单词中提取类别的问题。 我在群集中有几个单词(“apple”,“iMac”,“snowleopard”) 我想在那些单词中检索类别。
(“apple”,“iMac”,“snowleopard”) - > “Mac OS X”
我尝试过使用像WordNet这样的词法数据库,但它不起作用。我一直在寻找其他方法,并发现维基百科可能有所帮助。 任何维基百科的java库?以及我上面提到的如何做这样的任务? 感谢
答案 0 :(得分:0)
您可以尝试使用Wikipedia从这些术语中提取一些含义。例如,针对Wikipedia API的以下查询:
产生以下结果:
{
"query": {
"searchinfo": {
"totalhits": 3,
"suggestion": "apple iMac snow leopard\"\""
},
"pages": {
"2020710": {
"pageid": 2020710,
"ns": 0,
"title": "Apple's transition to Intel processors",
"categories": [
{
"ns": 14,
"title": "Category:Apple Inc."
},
{
"ns": 14,
"title": "Category:Intel Corporation"
},
{
"ns": 14,
"title": "Category:Mac OS X"
}
]
},
"14059031": {
"pageid": 14059031,
"ns": 0,
"title": "Mac OS X Snow Leopard",
"categories": [
{
"ns": 14,
"title": "Category:2009 software"
},
{
"ns": 14,
"title": "Category:Mac OS X"
}
]
},
"20640": {
"pageid": 20640,
"ns": 0,
"title": "OS X",
"categories": [
{
"ns": 14,
"title": "Category:1999 software"
},
{
"ns": 14,
"title": "Category:Apple Inc. operating systems"
},
{
"ns": 14,
"title": "Category:Apple Inc. software"
},
{
"ns": 14,
"title": "Category:Mac OS X"
},
{
"ns": 14,
"title": "Category:Mach"
}
]
}
}
},
"query-continue": {
"categories": {
"clcontinue": "14059031|X86-64 operating systems"
}
}
}
从这些数据中确定什么是“正确”类别可能并不容易,但这是一个开始。