Question

在将美丽的汤与正则表达式一起使用时，我需要帮助以提供正确的语法

我正在使用下面的代码来节省时间。时间位于包含段落的DIV中。 DIV及其内容如下所示：

<div class="details"> 
    <p> $25 
    <br>
     8/23<br>
     7:00 pm 
     </p>                             
</div>

代码：

from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen('https://place_holder/')
bs = BeautifulSoup(html.read(), 'html.parser')
for time_date in bs.find_all("div", {"class": "details"}):
    print(time_date.text)

运行上面的代码时，我得到以下结果。

$25 
8/23
7:00 pm

因为我只想提取时间（下午7:00），所以我想使用正则表达式来做到这一点。我无法提供有效的正确语法。我希望有人能帮助我。

Answer 1

这里不需要正则表达式。 BeautifulSoup可以为您提供所需的数据。只需使用<p>访问.contents[-1]标签的最后一个元素即可。

for time_date in bs.find_all("div", {"class": "details"}):
    print(time_date.p.contents[-1].strip())
# 7:00 pm

标签的.contents如下：

[' $25 \n    ', <br/>, '\n     8/23', <br/>, '\n     7:00 pm \n     ']

如果您需要使用RegEx，则可以使用以下方法：

for time_date in bs.find_all("div", {"class": "details"}):
    print(re.findall(r'\d+:\d+ [ap]m', time_date.text)[0])

使用带有正则表达式的美丽汤来提取时间

1 个答案: