如何从维基百科上抓取未排序的链接列表

时间:2018-12-26 15:22:55

标签: python web-scraping wikipedia-api

我正试图从此维基百科页面https://en.wikipedia.org/wiki/Ivan_Krypiakevych中获取所有链接,您可以在此页面中看到第一个链接是乌克兰语,第二个是利沃夫大学,第三个是乌克兰等。

我尝试使用python wikipediaAPI,但它会返回所有链接,但它们的顺序是从A到Z。

我的代码:

from bs4 import BeautifulSoup as bs
import requests
from pprint import pprint
import wikipediaapi

def print_links(page):
        links = page.links
        for title in sorted(links.keys()):
            print("%s: %s" % (title, links[title]))

wiki_wiki = wikipediaapi.Wikipedia(
        language='en',
        extract_format=wikipediaapi.ExtractFormat.WIKI
)

page_py = wiki_wiki.page('Ivan Krypiakevych')
print_links(page_py)

代码返回了我

> Austrian Galicia: Austrian Galicia (id: ??, ns: 0) Biblioteca Nacional
> de España: Biblioteca Nacional de España (id: ??, ns: 0) Bohdan
> Khmelnytsky: Bohdan Khmelnytsky (id: ??, ns: 0) Bourgeois nationalism:
> Bourgeois nationalism (id: ??, ns: 0) Báthory: Báthory (id: ??, ns: 0)
> Category:Wikipedia articles with BNE identifiers: Category:Wikipedia
> articles with BNE identifiers (id: ??, ns: 14) Category:Wikipedia
> articles with GND identifiers: Category:Wikipedia articles with GND
> identifiers (id: ??, ns: 14) Category:Wikipedia articles with ISNI
> identifiers: Category:Wikipedia articles with ISNI identifiers (id:
> ??, ns: 14) Category:Wikipedia articles with LCCN identifiers:
> Category:Wikipedia articles with LCCN identifiers (id: ??, ns: 14)
> Category:Wikipedia articles with LNB identifiers: Category:Wikipedia
> articles with LNB identifiers (id: ??, ns: 14) Category:Wikipedia
> articles with SUDOC identifiers: Category:Wikipedia articles with
> SUDOC identifiers (id: ??, ns: 14) Category:Wikipedia articles with
> VIAF identifiers: Category:Wikipedia articles with VIAF identifiers
> (id: ??, ns: 14) Chełm Land: Chełm Land (id: ??, ns: 0) ... ...
> Ukrainian language: Ukrainian language (id: ??, ns: 0) Ukrainian
> nationalism: Ukrainian nationalism (id: ??, ns: 0) Virtual
> International Authority File: Virtual International Authority File
> (id: ??, ns: 0) Western Ukraine: Western Ukraine (id: ??, ns: 0)
> WorldCat Identities: WorldCat Identities (id: ??, ns: 0) Zhovkva:
> Zhovkva (id: ??, ns: 0)

0 个答案:

没有答案