只是尝试从.csv文件中输入链接,然后从每个链接中抓取信息,然后将其写入.csv中的其他列。我好几天都在摸不着头脑。其他人可以看到这里有什么问题吗?错误发生在soup
def scrape_data(csv_file):
writer = csv.writer(csv_file)
reader = csv.reader(csv_file)
for row in reader:
if row:
# THE ERROR HAPPENS AT THE SOUP OBJECT BELOW
soup = BeautifulSoup(urllib.request.urlopen(row[0], 'lxml'))
post_time = soup.find('time', {'class' : 'date timeago'})
sqfeet = (sqft.text for sqft in soup.find('span', {'class' : 'shared-line-bubble'}))
availability = (soup.find('span', {'class' : 'data-date'}))
attribute_group = (ag.text for ag in soup.find('p', {'class' : 'attrgroup'}))
address = (add.text for add in soup.find('div', {'class' : 'mapaddress'}))
for data in zip(post_time, sqfeet, availability, attribute_group, address):
writer.writerow(row[3])
答案 0 :(得分:3)
'lxml'
部分必须是BeautifulSoup()
的参数,但参数为urllib.request.urlopen()