如何解析维基词典?

时间:2013-12-02 20:39:44

标签: parsing wikimedia wiktionary

缺乏在线资源,证明我可以解析维基词典API的响应,看起来像this

{
    "query": {
        "pages": {
            "40915": {
                "pageid": 40915,
                "ns": 0,
                "title": "reluctant",
                "revisions": [
                    {
                        "contentformat": "text/x-wiki",
                        "contentmodel": "wikitext",
                        "*": "==English==\n\n===Etymology===\nFrom {{etyl|la|en}} {{term|lang=la|reluctans}}, present participle of {{term|reluctare}}, {{term|reluctari||to struggle against, oppose, resist}}, from {{term|re-||back}} + {{term|luctari||to struggle}}.\n\n===Pronunciation===\n* {{IPA|/ɹɪˈlʌktənt/}}\n* {{audio|en-us-reluctant.ogg|Audio (US)}}\n\n===Adjective===\n{{en-adj}}\n\n# {{context|now|_|rare|lang=en}} [[opposing|Opposing]]; offering [[resistance]] (to).\n#* '''1819''', Lord Byron, ''Don Juan'', II.108:\n#*: There, breathless, with his digging nails he clung / Fast to the sand, lest the returning wave, / From whose '''reluctant''' roar his life he wrung, / Should suck him back to her insatiate grave [...].\n#* '''2008''', Kern Alexander et al., ''The World Trade Organization and Trade in Services'', p. 222:\n#*: They are '''reluctant''' to the inclusion of a necessity test, especially of a horizontal nature, and emphasize, instead, the importance of procedural disciplines [...].\n# Not [[wanting]] to take some [[action]]; [[unwilling]].\n#: ''She was '''reluctant''' to lend him the money''\n\n====Synonyms====\n* [[unwilling]], [[disinclined]]\n\n====Translations====\n{{trans-top|not wanting to take some action}}\n* Chinese: \n*: Mandarin: {{t|cmn|不情願|sc=Hani}}, {{t+|cmn|不情愿|tr=bùqíngyuàn|sc=Hani}}\n* Czech: {{t|cs|neochotný}}, {{t|cs|zdráhající}} se\n* Dutch: {{t+|nl|aarzelend}}\n* Finnish: {{t+|fi|haluton}}, {{t+|fi|vastahakoinen}}\n* French: {{t+|fr|réservé}},  {{t+|fr|réfractaire}},  {{t+|fr|rétif}}\n* German: {{t|de|zögernd}}\n* Hungarian: {{t|hu|kelletlen}}\n* Indonesian: {{t+|id|enggan}}\n* Interlingua: [[reluctante]]\n* Italian: {{t+|it|riluttante}}\n{{trans-mid}}\n* Latin: {{t|la|invītus}}\n* Manx: {{t|gv|neuarryltagh}}, {{t|gv|neuwooiagh}}\n* Maori: {{t|mi|whakawhēuaua}}, {{t|mi|manauhea}}\n* Polish: [[niechętny]]\n* Romanian: reticent, precaut, {{t|ro|prevăzător}}\n* Russian: {{t+|ru|неохотный|tr=neoxótnyj}}\n* Scots: {{t|sco|sweer}}, {{t|sco|sweirt}}, {{t|sco|laith}}\n* Scottish Gaelic: {{t|gd|aindeònach}}, {{t|gd|leisg}}\n* Spanish: {{t+|es|renuente}}, {{t|es|reacio}}\n* Swedish: {{t|sv|motvillig}}\n{{trans-bottom}}\n\n====Related terms====\n* [[reluctance]]\n* [[reluctantly]]\n\n===External links===\n* {{R:Webster 1913}}\n* {{R:Century 1911}}\n* {{R:OneLook}}\n\n[[ca:reluctant]]\n[[cy:reluctant]]\n[[et:reluctant]]\n[[el:reluctant]]\n[[es:reluctant]]\n[[fr:reluctant]]\n[[ko:reluctant]]\n[[io:reluctant]]\n[[kn:reluctant]]\n[[ku:reluctant]]\n[[hu:reluctant]]\n[[mg:reluctant]]\n[[ml:reluctant]]\n[[my:reluctant]]\n[[nl:reluctant]]\n[[pl:reluctant]]\n[[pt:reluctant]]\n[[simple:reluctant]]\n[[fi:reluctant]]\n[[sv:reluctant]]\n[[ta:reluctant]]\n[[te:reluctant]]\n[[th:reluctant]]\n[[vi:reluctant]]\n[[zh:reluctant]]"
                    }
                ]
            }
        }
    }
}

基本上我想要的只是英文定义,但是响应格式是如此奇怪,以至于关于这个词的所有内容都被混杂成一个不可分割的大blob。

  1. 是否有一种API方式以实际 JSON格式获取响应,其中英文定义只是一个JSON密钥?
  2. 我是否必须使用正则表达式模式来执行此操作,这看起来如何?
  3. 最后,为什么API设计者会返回这样的数据?我想判断并说他们不知道他们在做什么,但肯定有理由。

1 个答案:

答案 0 :(得分:3)

使用extracts属性获取html版本

https://en.wiktionary.org/w/api.php?titles=cloud&action=query&prop=extracts&format=jsonfm