获取数字列表

时间:2018-02-02 22:14:45

标签: python beautifulsoup

我用美丽的汤取了一个班级

soup.select('.pr-xs')

导致

    [<span class="instructor-block__students-subscribed pl-xs pr-xs">
     1,184,500 students
   </span>, <span class="instructor-block__students-subscribed pl-xs pr-xs">
     697,000 students
   </span>, <span class="instructor-block__students-subscribed pl-xs pr-xs">
     167,500 students
   </span>, <span class="instructor-block__students-subscribed pl-xs pr-xs">
     145,500 students
   </span>, <span class="instructor-block__students-subscribed pl-xs pr-xs">
     81,000 students
   </span>, <span class="instructor-block__students-subscribed pl-xs pr-xs">
     172,000 students
   </span>]

现在我想要一个新列表只包含

之类的数字
['1184500, 697000, 167500,145500,81000,172000]

3 个答案:

答案 0 :(得分:1)

尝试此操作可获得以下结果:

from bs4 import BeautifulSoup

soup = BeautifulSoup(content,"lxml")
data = [item.text.split("students")[0].strip() for item in soup.select('.pr-xs')]
print(data)

输出:

['1,184,500', '697,000', '167,500', '145,500', '81,000', '172,000']

答案 1 :(得分:0)

您可以使用re查找数字:

import re
numbers = [re.sub(',', '', re.findall('[\d,]+', str(i))[0]) for i in soup.select('.pr-xs')]

答案 2 :(得分:0)

借助回归函数

import re
students = [re.sub(',', '', re.findall('[\d,]+', str(i))[0]) for i in soup.select('.pr-xs')]

这将为您提供结果

['1184500','697000','167500','145500','81000','172000']