<p class="topVenue-details-info-details-subtitle">
Outram Park
<span class="topVenue-details-info-details-subtitle distance" data-
latitude="1.2783991" data-longitude="103.8408724"></span>
· ~$25/pax
</p>
我正在尝试提取“ $ 25 / pax”。HTML代码使用不同的定价会更长。是否有一种方法可以提取而不提取标题和标签? 我没有尝试输入:
places= soup.find_all('p', class_ = "topVenue-details-info-details-subtitle")
任何帮助将不胜感激。谢谢。
答案 0 :(得分:0)
遍历各个地方,先得到.text
,然后再得到split()
,并得到最后一个元素:
[place.text.split()[-1] for place in places]
如果您要从一开始就剥离~
:
place.text.split()[-1].lstrip('~')
编辑:
根据您的评论,删除多余的单词:
[place.text.split()[-1].lstrip('~') for place in places if \
place.text.split()[-1].startswith('~')]
在这种情况下,使用简单的for
循环可能会避免多次执行同一操作:
output = []
for place in places:
value = place.text.split()[-1]
if value.startswith('~'):
output.append(value.lstrip('~'))
答案 1 :(得分:0)
您可以将select_one与Using Predefined Preprocessor Symbols for Intel® MKL Version-Dependent Compilation一起使用,并获取css selector:
s = """<p class="topVenue-details-info-details-subtitle">
Outram Park
<span class="topVenue-details-info-details-subtitle distance" data-
latitude="1.2783991" data-longitude="103.8408724"></span>
· ~$25/pax
</p>"""
import requests
from bs4 import BeautifulSoup
soup = BeautifulSoup(s, 'html.parser')
places= soup.select_one("p.topVenue-details-info-details-subtitle span")
print(places.next_sibling.strip()) # · ~$25/pax
答案 2 :(得分:0)
float32x2_t
如果您的html具有多个段落标记(如您提到的标记),请使用find_all
from bs4 import BeautifulSoup
html_doc = """
<p class="topVenue-details-info-details-subtitle">
Outram Park
<span class="topVenue-details-info-details-subtitle distance" data-
latitude="1.2783991" data-longitude="103.8408724"></span>
· ~$25/pax
</p>
"""
soup = BeautifulSoup(html, 'html.parser')
如果您的html只有一个段落标记,请使用find
places= soup.find_all('p', class_ = "topVenue-details-info-details-subtitle")
[soup.get_text().split('~')[1] for place in places ]