Question

这里是Python的新手。我正在尝试从此页面this page捕获一些数据。我正在尝试获取在两个列表中捕获的项目名称和项目类型。我可以弄清楚以后如何将它们加入一个表中。任何帮助都会很棒！

代码行可以单独工作，但是循环对我而言不起作用。这样可以成功生成两行代码：

import urllib
import bs4 as bs

sauce = urllib.request.urlopen('https://us.diablo3.com/en/item/helm/').read()
soup = bs.BeautifulSoup(sauce, 'lxml')

item_details =  soup.find('tbody')
print(item_details) 

item_name = item_details.find('div', class_='item-details').h3.a.text
print(item_name)

item_type = item_details.find('ul', class_='item-type').span.text
print(item_type)

这一次又一次地重复第一个item_name的值：

for div in soup.find_all('div', class_='item-details'):
    item_name = item_details.find('div', class_='item-details').h3.a.text
    print(item_name)
    item_type = item_details.find('ul', class_='item-type').span.text
    print(item_type)

这是输出：

Veil of Steel
Magic Helm
Veil of Steel
Magic Helm
Veil of Steel
Magic Helm
Veil of Steel
Magic Helm
Veil of Steel
Magic Helm
Veil of Steel
Magic Helm
Veil of Steel
Magic Helm
...

Answer 1

您需要使用find_all（返回列表）而不是find（返回单个元素）：

for i, j in zip(item_details.find_all('div', class_='item-details'), item_details.find_all('ul', class_='item-type')):
    print(i.h3.a.text, " - ", j.span.text)

输出为：

Veil of Steel  -  Magic Helm
Leoric's Crown  -  Legendary Helm
Harlequin Crest  -  Magic Helm
The Undead Crown  -  Magic Helm
...

或更可读的格式：

names = item_details.find_all('div', class_='item-details')
types = item_details.find_all('ul', class_='item-type')

for name, type in zip(names, types):
    print(name.h3.a.text, " - ", type.span.text)

Answer 2

您可以在详细信息部分的一个循环中执行此操作，而不是将它们保存在其他列表中并进行匹配

item_details = []
for sections in soup.select('.item-details'):
    item_name = sections.select_one('h3[class*="subheader-"]').text.strip()  # partial match subheader-1, subheader-2, ....
    item_type = sections.select_one('ul[class="item-type"]').text.strip()
    item_details.append([item_name, item_type])

print(item_details)

输出

[['钢铁面纱'，'魔法头盔']，[“利奥里克的王冠”，'传奇头盔'，....

Answer 3

这有效：

sauce = urllib.request.urlopen('https://us.diablo3.com/en/item/helm/').read()
soup = bs.BeautifulSoup(sauce, 'lxml')

item_names = soup.find_all('div', class_='item-details')
for ele in item_names:
   print(ele.h3.a.text)

item_type = soup.find_all('ul', class_='item-type')
for ele in item_type:
    print(ele.span.text)

为什么您的代码不起作用：

看起来您的代码没有遍历所有元素，而是继续获取相同的元素（所有元素都使用find_all）。

Python BeautifulSoup遍历表数据

3 个答案: