我用美丽的汤取了一个班级
soup.select('.pr-xs')
导致
[<span class="instructor-block__students-subscribed pl-xs pr-xs">
1,184,500 students
</span>, <span class="instructor-block__students-subscribed pl-xs pr-xs">
697,000 students
</span>, <span class="instructor-block__students-subscribed pl-xs pr-xs">
167,500 students
</span>, <span class="instructor-block__students-subscribed pl-xs pr-xs">
145,500 students
</span>, <span class="instructor-block__students-subscribed pl-xs pr-xs">
81,000 students
</span>, <span class="instructor-block__students-subscribed pl-xs pr-xs">
172,000 students
</span>]
现在我想要一个新列表只包含
之类的数字['1184500, 697000, 167500,145500,81000,172000]
答案 0 :(得分:1)
尝试此操作可获得以下结果:
from bs4 import BeautifulSoup
soup = BeautifulSoup(content,"lxml")
data = [item.text.split("students")[0].strip() for item in soup.select('.pr-xs')]
print(data)
输出:
['1,184,500', '697,000', '167,500', '145,500', '81,000', '172,000']
答案 1 :(得分:0)
您可以使用re
查找数字:
import re
numbers = [re.sub(',', '', re.findall('[\d,]+', str(i))[0]) for i in soup.select('.pr-xs')]
答案 2 :(得分:0)
借助回归函数
import re
students = [re.sub(',', '', re.findall('[\d,]+', str(i))[0]) for i in soup.select('.pr-xs')]
这将为您提供结果
['1184500','697000','167500','145500','81000','172000']