迭代改变列表长度并附加到另一个列表

时间:2021-02-12 03:31:25

标签: python list

我想遍历一个 beautifulsoup 对象,该对象根据它找到的与 HTML 标签匹配的元素数量来改变长度。

driver.get('https://www.inspection.gc.ca/food-recall-warnings-and-allergy-alerts/2021-02-10/eng/1613010591343/1613010596418')
page_source = driver.page_source

soup = BeautifulSoup(page_source, 'html.parser')
recall_details = soup.find('table', class_ = 'table table-bordered table-condensed')

recalled_products = recall_details.find_all('td')
recalled_products

输出:

[<td>One Ocean</td>,
 <td>Sliced Smoked  Wild Sockeye Salmon</td>,
 <td>300 g</td>,
 <td>6 25984 00005 3</td>,
 <td>11253</td>]

我想遍历每个 td 元素并附加到这样的列表中:

brands = []
products = []
sizes = []
upcs = []
codes = []

brand = recalled_products[0].text
product = recalled_products[1].text
size = recalled_products[2].text
upc = recalled_products[3].text
code = recalled_products[4].text
brands.append(brand)
products.append(product)
sizes.append(size)
upcs.append(upc)
codes.append(code)

print(brands)
print(products)
print(sizes)
print(upcs)
print(codes)

输出:

['One Ocean']
['Sliced Smoked  Wild Sockeye Salmon']
['300\xa0g']
['6\xa025984\xa000005\xa03']
['11253']

我尝试了以下代码,但没有得到预期的结果。我想我需要某种计数器。

for i in range(len(recalled_products)):
    brand = recalled_products[i].text
    product = recalled_products[i].text
    size = recalled_products[i].text
    upc = recalled_products[i].text
    code = recalled_products[i].text
    brands.append(brand)
    products.append(product)
    sizes.append(size)
    upcs.append(upc)
    codes.append(code)

print(brands)
print(products)
print(sizes)
print(upcs)
print(codes)
```

Output:

```
['One Ocean', 'Sliced Smoked  Wild Sockeye Salmon', '300\xa0g', '6\xa025984\xa000005\xa03', '11253']
['One Ocean', 'Sliced Smoked  Wild Sockeye Salmon', '300\xa0g', '6\xa025984\xa000005\xa03', '11253']
['One Ocean', 'Sliced Smoked  Wild Sockeye Salmon', '300\xa0g', '6\xa025984\xa000005\xa03', '11253']
['One Ocean', 'Sliced Smoked  Wild Sockeye Salmon', '300\xa0g', '6\xa025984\xa000005\xa03', '11253']
['One Ocean', 'Sliced Smoked  Wild Sockeye Salmon', '300\xa0g', '6\xa025984\xa000005\xa03', '11253']

这是网站的示例 html 代码 enter image description here

预先感谢您提供的任何帮助。

3 个答案:

答案 0 :(得分:2)

关于数据的问题是从返回

recalled_products = recall_details.find_all('td') 

A = [[<td>beef</td>,
     <td>250g</td>,
     <td>6 25984 00005 3</td>,
     <td>11253</td>],
     [<td>Salmon</td>,
     <td>300 g</td>,
     <td>6 25984 00005 3</td>,
     <td>11253</td>]]

b = [<td>beef</td>,
     <td>250g</td>,
     <td>6 25984 00005 3</td>,
     <td>11253</td>,
     <td>Salmon</td>,
     <td>300 g</td>,
     <td>6 25984 00005 3</td>,
     <td>11253</td>]

对于 A,您想使用索引二维数组

for i in range(len(recalled_products)):
    brand = recalled_products[i][0].text
    product = recalled_products[i][1].text

对于 B,您想在迭代中使用一个步骤

    for i in range(0,len(recalled_products),4):
      brand = recalled_products[i].text
      product = recalled_products[i+1].text

答案 1 :(得分:1)

这就是我获取标记的方式。

from bs4 import BeautifulSoup
import requests

URL = "https://www.inspection.gc.ca/food-recall-warnings-and-allergy-alerts/2021-02-10/eng/1613010591343/1613010596418"

brands = []
products = []
sizes = []
upcs = []
codes = []

page = requests.get(URL)

soup = BeautifulSoup(page.content, "html.parser")

recall_details = soup.find("table", class_="table table-bordered table-condensed")

body = recall_details.find("tbody")

rows = body.find_all("tr")

for row in rows:
    data = row.find_all("td")
    brands.append(data[0].text)
    products.append(data[1].text)
    sizes.append(data[2].text)
    upcs.append(data[3].text)
    codes.append(data[4].text)

印刷品

['One Ocean']
['Sliced Smoked  Wild Sockeye Salmon']
['300\xa0g']
['6\xa025984\xa000005\xa03']
['11253']

我确实认为 dict 是比多个列表更好的数据结构,但当然这因您的用例而异。

如果你想这样做,你可以像这样更改代码:


recalled = []

...

for row in rows:
    data = row.find_all("td")
    item = {
        "brand": data[0].text,
        "products": data[1].text,
        "sizes": data[2].text,
        "upcs": data[3].text,
        "codes": data[4].text,
    }
    recalled.append(item)

印刷品

[{'brand': 'One Ocean', 'products': 'Sliced Smoked  Wild Sockeye Salmon', 'sizes': '300\xa0g', 'upcs': '6\xa025984\xa000005\xa03', 'codes': '11253'}]

答案 2 :(得分:0)

在我看来,这就像您需要构建一个电子表格来保存需要存储的数据。您可以使用名为 openpyxl 的库来执行此操作,然后为品牌、产品、尺寸、upcs、代码创建列。然后将来自 beautifulsoup 对象的结果存储到电子表格中。