没有找到|的连接适配器Python 3.5 |要求

时间:2016-11-04 22:42:44

标签: python python-requests

我正在尝试从csv文件中收集的多个URL中提取文章。 但是,当我打印输出时,我收到此错误: InvalidSchema:找不到'['http://www.nytimes.com/2016/10/06/world/europe/police-brussels-knife-terrorism.html']'

的连接适配器
import csv
import requests
from bs4 import BeautifulSoup

with open('Training_news.csv', newline='') as file:
    reader= csv.reader (file, delimiter=' ')
    for row in reader:
        r=requests.get(row)
        r.encoding = "ISO-8859-1"
        soup = BeautifulSoup(r.content, 'lxml')
        text = soup.find_all(("p",{"class": "story-body-text story-content"}))

我认为问题出在“行”中,当我打印它时,我没有获得包含csv文件中所有URL的单个列表,而是列出了该文件的任何单个值:     [ 'http://www.nytimes.com/2016/10/06/world/europe/police-brussels-knife-terrorism.html'] [ 'http://www.nytimes.com/2016/06/29/world/europe/turkey-istanbul-airport-explosions.html']

1 个答案:

答案 0 :(得分:0)

row是一个列表。 requests.get需要一个字符串。你可以这样做,迭代每一行中的项目:

with open('Training_news.csv', newline='') as file:
    reader= csv.reader (file, delimiter=' ')
    for row in reader:
        for url in row:
            r=requests.get(url)
            r.encoding = "ISO-8859-1"
            soup = BeautifulSoup(r.content, 'lxml')
            text = soup.find_all(("p",{"class": "story-body-text story-content"}))