Question

我是Beautiful Soup的新手，我正在尝试提取页面上显示的信息。此信息包含在div class =“_ 50f3”中，根据用户的不同，它可以包含多个信息（研究，学习，工作，工作，生活等）。所以，我已经通过以下代码来解析div类，但我不知道如何从中提取我想要的信息..

table = soup.findAll('div', {'class': '_50f3'})

[<div class="_50f3">Lives in <a class="profileLink" data-hovercard="/ajax/hovercard/page.php?id=114148045261892" href="/Fort-Worth-Texas/114148045261892?ref=br_rs">Fort Worth, Texas</a></div>,
 <div class="_50f3">From <a class="profileLink" data-hovercard="/ajax/hovercard/page.php?id=111762725508574" href="/Dallas-Texas/111762725508574?ref=br_rs">Dallas, Texas</a></div>]

例如，在上面我想存储“Lives in”：“Fort Worth，Texas”和“From”：“Dallas，Texas”。但在最常见的情况下，我想存储其中的任何信息。

任何帮助都非常感谢！

Answer 1

一般情况下，这只是你需要的get_text() - 它会以递归方式构建一个单独的元素文本字符串：

table = soup.find_all('div', {'class': '_50f3'})
print([item.get_text(strip=True) for item in table])

但是，您也可以单独提取标签和值：

d = {}
for item in table:
    label = item.find(text=True)
    value = label.next_sibling

    d[label.strip()] = value.get_text()

print(d)

打印：

{'From': 'Dallas, Texas', 'Lives in': 'Fort Worth, Texas'}

Answer 2

for i in range(len(table)):
    print(table[i].text)

应该工作

用美丽的汤提取页面介绍信息

2 个答案: