我有以下html:
<div class="date_on_by">
<a sasource="qp_focused" href="/author/bill-maurer/articles">Bill Maurer</a>
<span class="bullet">•</span> Yesterday, 9:33 AM
<span class="bullet">•</span>
<span class="comments">98 Comments</span>
</div>
如果我使用text.find_all('div',class _ =“date_on_by”)。getText(),则返回“
Bill Maurer • Yesterday, 9:33 AM • 98 Comments
但我真正想要的只是:
Yesterday, 9:33 AM
不在任何儿童内容中。怎么做?
答案 0 :(得分:0)
我明白了!
for date in text.find_all('div',class_="date_on_by"):
dates.append(re.split(text.find_all('span',class_="bullet")[0].getText(),date.getText())[1])
答案 1 :(得分:0)
您可以使用span类名称和 next_sibling :
In [9]: h = """<div class="date_on_by">
...: <a sasource="qp_focused" href="/author/bill-maurer/articles">Bill Maurer</a>
...: <span class="bullet">•</span> Yesterday, 9:33 AM
...: <span class="bullet">•</span>
...: <span class="comments">98 Comments</span>
...: </div>"""
In [10]: from bs4 import BeautifulSoup
In [11]: soup = BeautifulSoup(h)
In [12]: print(soup.select_one("div.date_on_by span.bullet").next_sibling.strip())
Yesterday, 9:33 AM
另外,如果您只想要第一个元素,则应使用.find
代替find_all(..)[0]
。