我的目标是抓取存储在CSV文件中的URL列表。示例URL的格式如下:
http://mashable.com/2013/01/07/amazon-instant-video-browser/
如果我尝试将URL列表解析为Beautifulsoup,我现在得到以下错误:
URLError: <urlopen error unknown url type: http>
有人知道如何解决此问题吗?我认为这可能很容易解决,但我无法解决。这是我当前正在使用的代码:
import csv
from urllib.request import urlopen
from bs4 import BeautifulSoup
contents = []
with open('url.csv','r') as csvf: # Open file in read mode
urls = csv.reader(csvf)
for url in urls:
contents.append(url) # Add each url to list contents
for url in contents: # Parse through each url in the list.
page = urlopen(url[0]).read()
soup = BeautifulSoup(page, "html.parser")
print(soup)
答案 0 :(得分:0)
在您的For循环中使用Try和except以避免任何http错误。例如。 :
for url in urls:
**try:**
contents.append(url) # Add each url to list contents
**except:
pass**
for url in contents: # Parse through each url in the list.
**try:**
page = urlopen(url[0]).read()
soup = BeautifulSoup(page, "html.parser")
**except:
pass**