如何从这个BeautifulSoup中提取数字?

时间:2018-07-23 14:50:54

标签: python html beautifulsoup

<p class="topVenue-details-info-details-subtitle">
Outram Park
<span class="topVenue-details-info-details-subtitle distance" data- 
latitude="1.2783991" data-longitude="103.8408724"></span>
· ~$25/pax
</p>

我正在尝试提取“ $ 25 / pax”。HTML代码使用不同的定价会更长。是否有一种方法可以提取而不提取标题和标签? 我没有尝试输入:

places= soup.find_all('p', class_ = "topVenue-details-info-details-subtitle")

任何帮助将不胜感激。谢谢。

3 个答案:

答案 0 :(得分:0)

遍历各个地方,先得到.text,然后再得到split(),并得到最后一个元素:

[place.text.split()[-1] for place in places]

如果您要从一开始就剥离~

place.text.split()[-1].lstrip('~')

编辑:

根据您的评论,删除多余的单词:

[place.text.split()[-1].lstrip('~') for place in places if \
    place.text.split()[-1].startswith('~')]

在这种情况下,使用简单的for循环可能会避免多次执行同一操作:

output = []
for place in places:
    value = place.text.split()[-1]
    if value.startswith('~'):
        output.append(value.lstrip('~'))

答案 1 :(得分:0)

您可以将select_one与Using Predefined Preprocessor Symbols for Intel® MKL Version-Dependent Compilation一起使用,并获取css selector

s = """<p class="topVenue-details-info-details-subtitle">
Outram Park
<span class="topVenue-details-info-details-subtitle distance" data- 
latitude="1.2783991" data-longitude="103.8408724"></span>
· ~$25/pax
</p>"""

import requests
from bs4 import BeautifulSoup

soup = BeautifulSoup(s, 'html.parser')

places= soup.select_one("p.topVenue-details-info-details-subtitle span")
print(places.next_sibling.strip()) # · ~$25/pax

答案 2 :(得分:0)

float32x2_t

如果您的html具有多个段落标记(如您提到的标记),请使用find_all

from bs4 import BeautifulSoup

html_doc = """ 
<p class="topVenue-details-info-details-subtitle">
Outram Park
<span class="topVenue-details-info-details-subtitle distance" data- 
latitude="1.2783991" data-longitude="103.8408724"></span>
· ~$25/pax
</p>
"""

soup = BeautifulSoup(html, 'html.parser')

如果您的html只有一个段落标记,请使用find

places= soup.find_all('p', class_ = "topVenue-details-info-details-subtitle")

[soup.get_text().split('~')[1] for place in places ]