从Wikipedia / Wikidata /链接数据中获取消除歧义的同音异义词的列表

时间:2018-09-23 09:07:28

标签: api wikipedia-api linked-data wikidata-api

如果我在维基百科上手动搜索"George Bush",则会得到this page,其中列出了简短的同音异义词。

我想将搜索结果提供给api并获取以下信息:

  • 乔治·W·布什
  • 乔治·W·布什
  • 乔治·布什(圣经学者)
  • 乔治·布什(足球运动员)
  • 乔治·布什(赛车手)
  • 乔治·布什
  • 乔治·华盛顿·布什

我不介意获得更多,只要我能明确解析它即可。

我的目标是让网站的用户能够标记公众人物,但我想限制他们的选择并避免歧义,因此此列表可能会略有不同,其他任何带有api的不错的数据库都可以。

我还没有弄清楚如何使用Wikipedia或wikidata来做到这一点,我只是设法在知道特定的ID /页面上进行了查询,在这里不是这种情况。

1 个答案:

答案 0 :(得分:2)

有两种方法可以执行此操作,具体取决于所需的数据类型。

例如-https://en.wikipedia.org/w/api.php?action=query&titles=George%20Bush&prop=links-会告诉您该人的姓名是否存在“歧义”。

这将返回:

               {
                    "ns": 0,
                    "title": "Bush family"
                },
                {
                    "ns": 0,
                    "title": "George Brush (disambiguation)"
                },
                {
                    "ns": 0,
                    "title": "George Bush (biblical scholar)"
                },
                {
                    "ns": 0,
                    "title": "George Bush (footballer)"
                },
                {
                    "ns": 0,
                    "title": "George Bush (racing driver)"
                },
                {
                    "ns": 0,
                    "title": "George H. W. Bush"
                },
                {
                    "ns": 0,
                    "title": "George P. Bush"
                },
                {
                    "ns": 0,
                    "title": "George W. Bush"
                },
                {
                    "ns": 0,
                    "title": "George Washington Bush"

您可以使用-https://en.wikipedia.org/w/api.php?action=query&utf8=&list=search&srsearch=George%20Bush

一次获取更多数据

那会让你:

    "search": [
        {
            "ns": 0,
            "title": "George W. Bush",
            "pageid": 3414021,
            "size": 299185,
            "wordcount": 27007,
            "snippet": "<span class=\"searchmatch\">George</span> Walker <span class=\"searchmatch\">Bush</span> (born July 6, 1946) is an American politician who served as the 43rd President of the United States from 2001 to 2009. He had previously",
            "timestamp": "2018-09-26T21:48:08Z"
        },
        {
            "ns": 0,
            "title": "George H. W. Bush",
            "pageid": 11955,
            "size": 210189,
            "wordcount": 20867,
            "snippet": "<span class=\"searchmatch\">George</span> Herbert Walker <span class=\"searchmatch\">Bush</span> (born June 12, 1924) is an American politician who served as the 41st President of the United States from 1989 to 1993. Prior",
            "timestamp": "2018-10-01T06:41:50Z"
        },