一对一获取维基百科的电影部分

时间:2019-05-18 07:33:37

标签: php curl mediawiki wiki mediawiki-api

我正在尝试从Wikipedia页面获取电影情节和其他信息。我有电影名称和年份,从中我必须找到准确的电影及其各自的情节以及其他信息。

我正在使用维基百科https://en.wikipedia.org/w/api.php?action=query&list=search&format=jsonfm&srsearch=matrix%20incategory:English-language_films

我收到以下答复

{
"batchcomplete": "",
"continue": {
    "sroffset": 10,
    "continue": "-||"
},
"query": {
    "searchinfo": {
        "totalhits": 176
    },
    "search": [
        {
            "ns": 0,
            "title": "The Matrix",
            "pageid": 30007,
            "size": 123422,
            "wordcount": 12668,
            "snippet": "The <span class=\"searchmatch\">Matrix</span> is a 1999 science fiction action film written and directed by the Wachowskis that stars Keanu Reeves, Laurence Fishburne, Carrie-Anne Moss,",
            "timestamp": "2019-05-17T20:53:05Z"
        },

我需要搜索所有电影,而不仅仅是英语电影。我需要直接从搜索中获取绘图部分文本。

1 个答案:

答案 0 :(得分:1)

TL; DR

首次安装:

$ pip3 install imdbpy wikipedia

然后:

>>> import wikipedia
>>> from imdb import IMDb
>>> imdb = IMDb()

>>> imdb.search_movie('avengers')
[<Movie id:0848228[http] title:_The Avengers (2012)_>, <Movie id:0203247[http] title:_"Avengers: United They Stand" (1999)_>, <Movie id:2164490[http] title:_Avengers (1987) (VG)_>, <Movie id:4154796[http] title:_Avengers: Endgame (2019)_>, <Movie id:4154756[http] title:_Avengers: Infinity War (2018)_>, <Movie id:2395427[http] title:_Avengers: Age of Ultron (2015)_>, <Movie id:2455546[http] title:_"Avengers Assemble" (2013)_>, <Movie id:1626038[http] title:_"The Avengers: Earth's Mightiest Heroes" (2010)_>, <Movie id:0458339[http] title:_Captain America: The First Avenger (2011)_>, <Movie id:0118661[http] title:_The Avengers (1998)_>, <Movie id:0054518[http] title:_"The Avengers" (1961)_>, <Movie id:1355644[http] title:_Passengers (I) (2016)_>, <Movie id:8836988[http] title:_Avengement (I) (2019)_>, <Movie id:0473445[http] title:_Avenger (2006) (TV)_>, <Movie id:9426186[http] title:_Revenger (2018)_>, <Movie id:2378453[http] title:_Avenged (2013)_>, <Movie id:4296026[http] title:_Avengers Grimm (2015) (V)_>, <Movie id:0491703[http] title:_Ultimate Avengers (2006) (V)_>, <Movie id:0090190[http] title:_The Toxic Avenger (1984)_>, <Movie id:0056174[http] title:_The Avenger (1962)_>]

>>> title = imdb.search_movie('avengers')[0].data['title']
'The Avengers'

>>> wiki_page = wikipedia.page(title)

>>> wiki_page.url
'https://en.wikipedia.org/wiki/Avengers_(comics)'

>>> print(wiki_page.content)

请参阅: