Question

使用Beautiful Soup在python html解析中将xml转换为文本的理想方法是什么？

当我使用Python 2.7 BeautifulSoup库进行html解析时，我可以进入“汤”的步骤，但我不知道如何提取我需要的数据，所以我尝试将它们全部转换为字符串。

在以下示例中，我想提取span标记中的所有数字并将其添加。还有更好的方法吗？

XML数据： http://python-data.dr-chuck.net/comments_324255.html

CODE：

=INDEX(array,row_num,[col_num])

Answer 1

不需要正则表达式：

from bs4 import BeautifulSoup
from requests import get

url = 'http://python-data.dr-chuck.net/comments_324255.html'
html = get(url).text
soup = BeautifulSoup(html, 'lxml')

count = sum(int(n.text) for n in soup.findAll('span'))

Answer 2

import requests, bs4
r = requests.get("http://python-data.dr-chuck.net/comments_324255.html")
soup = bs4.BeautifulSoup(r.text, 'lxml')

sum(int(span.text) for span in soup.find_all(class_="comments"))

输出：

在使用Beautiful Soup进行python html解析时使用xml数据的理想方法是什么？

2 个答案: