我已经在python中创建了一个脚本,以从网页中抓取title
,description
和images
。脚本可以以正确的方式获取它们。 title
和desc
是字符串,但images
在列表中。现在,我试图将它们写入一个csv文件中。但是,我遇到的问题是所有图像都堆叠在一行中。
如何在不同的列中写入现有字段以及所有图像?
到目前为止,我已经尝试过:
import csv
import requests
from bs4 import BeautifulSoup
url = "https://www.amazon.com/Sealect-Designs-Universal-Anchor-Trolly/dp/B01LYUYI8A?ref_=ast_bbp_dp"
def get_content(link):
res = requests.get(link,headers={'User-Agent':'Mozilla/5.0'})
soup = BeautifulSoup(res.text,"lxml")
title = soup.select_one("span#productTitle").get_text(strip=True)
desc = soup.select_one("#productDescription > p").get_text(strip=True)
images = [item.get("src") for item in soup.select("span.a-button-text > img[src$='jpg']")]
writer.writerow([title,desc,images])
print(title,desc,images)
if __name__ == '__main__':
with open("outputfile.csv","w",newline="") as infile:
writer = csv.writer(infile)
get_content(url)
当前输出:
column1: title
column2: description
column3: [images]
预期输出:
column1: title
column2: description
column3: image1
column4: image2
column5: image3
and so on
答案 0 :(得分:2)
您可以使用星号解压缩列表图像的元素。如果您改写
writer.writerow([title,desc,*images])
您应该获得所需的输出