CSV中的URL爬取列表给出了未知的URL类型错误

时间:2019-04-26 21:06:40

标签: python csv web-scraping beautifulsoup

我的目标是抓取存储在CSV文件中的URL列表。示例URL的格式如下:

http://mashable.com/2013/01/07/amazon-instant-video-browser/

如果我尝试将URL列表解析为Beautifulsoup,我现在得到以下错误:

URLError: <urlopen error unknown url type: http>

有人知道如何解决此问题吗?我认为这可能很容易解决,但我无法解决。这是我当前正在使用的代码:

import csv
from urllib.request import urlopen
from bs4 import BeautifulSoup

contents = []
with open('url.csv','r') as csvf: # Open file in read mode
    urls = csv.reader(csvf)
    for url in urls:
        contents.append(url) # Add each url to list contents

for url in contents:  # Parse through each url in the list.
    page = urlopen(url[0]).read()
    soup = BeautifulSoup(page, "html.parser")
print(soup)

1 个答案:

答案 0 :(得分:0)

在您的For循环中使用Try和except以避免任何http错误。例如。 :

for url in urls:
       **try:**
         contents.append(url) # Add each url to list contents
       **except:
         pass**
for url in contents:  # Parse through each url in the list.
       **try:**
         page = urlopen(url[0]).read()
         soup = BeautifulSoup(page, "html.parser")
       **except:
         pass**