我正在尝试解析存储在csv文件中的链接,然后为每个链接打印标题。当我尝试读取链接并进行解析以获取每个链接的标题时,我在代码底部遇到了麻烦。
import csv
from bs4 import BeautifulSoup
from urllib.request import urlopen
contents = []
filename = 'scrap.csv'
with open(filename,'rt') as f:
data = csv.reader(f)
for row in data:
links = row[0]
contents.append(links) #add each url to list of contents
for links in contents: #parse through each url in the list contents
url = urlopen(links[0].read())
soup = BeautifulSoup(url,"html.parser")
for title in soup.find_all('title'):
print(title)
我希望输出的结果是每行打印的标题,但出现以下错误 第17行 url = urlopen(links [0] .read()) AttributeError:“ str”对象没有属性“ read”
答案 0 :(得分:0)
将url = urlopen(links [0] .read())更改为url = urlopen(links).read()
答案 1 :(得分:0)
尝试此代码。这样应该可以工作,并减少您的开销。
import pandas as pd
for link in pd.read_csv('scrap.csv')[0].values:
url = urlopen(link)
soup = BeautifulSoup(url,"html.parser")
答案 2 :(得分:0)
import csv
from bs4 import BeautifulSoup
from urllib.request import urlopen
import requests
contents = []
def soup_title():
for title in soup.find_all('title'):
title_name = title
return title_name
filename = 'scrap.csv'
with open(filename,'rt') as f:
data = csv.reader(f)
for row in data:
links = row[0]
contents.append(links) #add each url to list of contents
for links in contents: #parse through each url in the list contents
url = requests.get(links)
soup = BeautifulSoup(url.text,"html.parser")
brand_info = soup_title()
print(brand_info)