通过存储在CSV文件中的链接进行解析

时间:2019-06-07 09:11:14

标签: python csv screen-scraping

我正在尝试解析存储在csv文件中的链接,然后为每个链接打印标题。当我尝试读取链接并进行解析以获取每个链接的标题时,我在代码底部遇到了麻烦。

import csv
from bs4 import BeautifulSoup
from urllib.request import urlopen

contents = []

filename = 'scrap.csv'

with open(filename,'rt') as f:
    data = csv.reader(f)

    for row  in data:
        links = row[0]
        contents.append(links) #add each url to list of contents

for links in contents: #parse through each url in the list contents
    url = urlopen(links[0].read())
    soup = BeautifulSoup(url,"html.parser")

for title in soup.find_all('title'):
    print(title)

我希望输出的结果是每行打印的标题,但出现以下错误 第17行     url = urlopen(links [0] .read()) AttributeError:“ str”对象没有属性“ read”

3 个答案:

答案 0 :(得分:0)

将url = urlopen(links [0] .read())更改为url = urlopen(links).read()

答案 1 :(得分:0)

尝试此代码。这样应该可以工作,并减少您的开销。

import pandas as pd
for link in pd.read_csv('scrap.csv')[0].values:
    url = urlopen(link)
    soup = BeautifulSoup(url,"html.parser")

答案 2 :(得分:0)

import csv
from bs4 import BeautifulSoup
from urllib.request import urlopen
import requests

contents = []

def soup_title():
    for title in soup.find_all('title'):
        title_name = title
        return title_name

filename = 'scrap.csv'

with open(filename,'rt') as f:
    data = csv.reader(f)

    for row  in data:
        links = row[0]
        contents.append(links) #add each url to list of contents

for links in contents: #parse through each url in the list contents
     url = requests.get(links)
     soup = BeautifulSoup(url.text,"html.parser")
     brand_info = soup_title()
     print(brand_info)