我已经构建了一个刮板,尽管它似乎在拉正确的HTML,但是当我遍历容器标签时,它似乎只拉一条记录。我是菜鸟,所以我希望我错过了一些简单的事情,但经过几个小时的研究,我却陷入了困境。
我一直在寻找解决方案,并且已经确保它实际上在提取我需要的所有HTML。但是,当我在最后运行此代码时,我只会得到一个结果,而不是所有结果。导出到.csv时也是如此。
print("product_name: " + product_name)
print("product_number: " + product_number)
print("category: " + category)
有关代码如下:
containers = page_soup.findAll("tr",{"class":"Product"})
for container in containers:
product_name = container.a.text
product_number = container.div.text
category_container = container.select_one('td:nth-of-type(4)').text.strip()
category = category_container
我希望获得1000多种产品的输出,但是我只会得到一种。我想念什么?任何帮助将不胜感激。
答案 0 :(得分:1)
变量product_name
,product_number
,category
只能保留一个值-循环中的最后一个值。
因此您可以在循环内使用print()
来查看值
import csv
f = open(filename, 'w')
csv_writer = csv.writer(f)
# header
csv_writer.writerow( ["Product Name", "Product number", "Category"] )
for container in containers:
product_name = container.a.text
product_number = container.div.text
category = container.select_one('td:nth-of-type(4)').text.strip()
# single row
csv_writer.writerow( [product_name, product_number, category] )
print("product_name:", product_name)
print("product_number:", product_number)
print("category: ", category)
f.close()
或者您必须创建列表并使用append()
将值添加到列表中
product_name = []
product_number = []
category = []
for container in containers:
product_name.append( container.a.text )
product_number.append( container.div.text )
category.append( container.select_one('td:nth-of-type(4)').text.strip() )
#--- later ---
print("product_name:", product_name)
print("product_number:", product_number)
print("category: ", category)
f = open(filename, 'w')
csv_writer = csv.writer(f)
# header
csv_writer.writerow( ["Product Name", "Product number", "Category"] )
for a, b, c in zip(product_name, product_number, category):
# single row
csv_writer.writerow( [a, b, c] )
f.close()
编辑:您也可以将其保留为带有字典的列表
all_items = []
for container in containers:
item = {
'product_name': container.a.text,
'product_number': container.div.text,
'category': container.select_one('td:nth-of-type(4)').text.strip(),
}
all_items.append(item)
# --- later ---
f = open(filename, 'w')
csv_writer = csv.writer(f)
# header
csv_writer.writerow( ["Product Name", "Product number", "Category"] )
for item in all_items:
print("product_name:", item['product_name'])
print("product_number:", item['product_number'])
print("category: ", item['category'])
# single row
csv_writer.writerow( [item['product_name'], item['product_number'], item['category']] )
f.close()