如何修复在多个容器标签上迭代但仅输出一个的刮板

时间:2019-05-04 13:41:13

标签: python web-scraping beautifulsoup containers

我已经构建了一个刮板,尽管它似乎在拉正确的HTML,但是当我遍历容器标签时,它似乎只拉一条记录。我是菜鸟,所以我希望我错过了一些简单的事情,但经过几个小时的研究,我却陷入了困境。

我一直在寻找解决方案,并且已经确保它实际上在提取我需要的所有HTML。但是,当我在最后运行此代码时,我只会得到一个结果,而不是所有结果。导出到.csv时也是如此。

print("product_name: " + product_name)
print("product_number: " + product_number)
print("category: " + category)

有关代码如下:

containers = page_soup.findAll("tr",{"class":"Product"})

for container in containers:

    product_name = container.a.text

    product_number = container.div.text

    category_container = container.select_one('td:nth-of-type(4)').text.strip()
    category = category_container

我希望获得1000多种产品的输出,但是我只会得到一种。我想念什么?任何帮助将不胜感激。

1 个答案:

答案 0 :(得分:1)

变量product_nameproduct_numbercategory只能保留一个值-循环中的最后一个值。

因此您可以在循环内使用print()来查看值

import csv

f = open(filename, 'w')
csv_writer = csv.writer(f)

# header
csv_writer.writerow( ["Product Name", "Product number", "Category"] ) 

for container in containers:
    product_name = container.a.text
    product_number = container.div.text
    category = container.select_one('td:nth-of-type(4)').text.strip()

    # single row 
    csv_writer.writerow( [product_name, product_number, category] ) 

    print("product_name:", product_name)
    print("product_number:", product_number)
    print("category: ", category)

f.close()

或者您必须创建列表并使用append()将值添加到列表中

product_name = []
product_number = []
category = []

for container in containers:
    product_name.append( container.a.text )
    product_number.append( container.div.text )
    category.append( container.select_one('td:nth-of-type(4)').text.strip() )

#--- later ---

print("product_name:", product_name)
print("product_number:", product_number)
print("category: ", category)    


f = open(filename, 'w')
csv_writer = csv.writer(f)

# header
csv_writer.writerow( ["Product Name", "Product number", "Category"] ) 

for a, b, c in zip(product_name, product_number, category):
    # single row 
    csv_writer.writerow( [a, b, c] ) 

f.close()

编辑:您也可以将其保留为带有字典的列表

all_items = []    

for container in containers:
    item = {
        'product_name': container.a.text,
        'product_number': container.div.text,
        'category': container.select_one('td:nth-of-type(4)').text.strip(),
    }
    all_items.append(item)

# --- later ---

f = open(filename, 'w')
csv_writer = csv.writer(f)

# header
csv_writer.writerow( ["Product Name", "Product number", "Category"] ) 

for item in all_items:
    print("product_name:", item['product_name'])
    print("product_number:", item['product_number'])
    print("category: ", item['category'])    

    # single row 
    csv_writer.writerow( [item['product_name'], item['product_number'], item['category']] ) 

f.close()