使用BeauitifulSoup的Web Scraping错误:[Errno 10061]

时间:2016-12-29 10:23:29

标签: python web-scraping beautifulsoup

尝试使这段代码工作:(使用BeautifulSoup进行网页抓取示例)

import urllib2    
wiki = "https://en.wikipedia.org/wiki/List_of_state_and_union_territory_capitals_in_India"
page = urllib2.urlopen(wiki)
from bs4 import BeautifulSoup
soup = BeautifulSoup(page)

我收到此错误: -

URLError: <urlopen error [Errno 10061] No connection could be made because the target machine actively refused it>

我想这与某些防火墙/安全相关的问题有关,有人可以帮忙做些什么吗?

1 个答案:

答案 0 :(得分:1)

您可以使用requests尝试类似的内容:

import requests
from bs4 import BeautifulSoup 

wiki = "https://en.wikipedia.org/wiki/List_of_state_and_union_territory_capitals_in_India"
page = requests.get(wiki).content
soup = BeautifulSoup(page)

如果您想要获得该表,您可以像这样使用pandas:

import pandas as pd

wiki = "https://en.wikipedia.org/wiki/List_of_state_and_union_territory_capitals_in_India"
df = pd.read_html(wiki)[1]
df2 = df.copy()
df2.columns = df.iloc[0]
df2.drop(0, inplace=True)
df2.drop('No.', axis=1, inplace=True)
df2.head()

输出:

enter image description here