Question

我是python和漂亮的新手所以这个答案可能很明显。

我使用漂亮的汤来解析以下的html并提取日期。

html='''
<p><strong>Event:</strong>Meeting</p>
<p><strong>Date:</strong> Mon, Apr 25, 2016, 11 am</p>
<p><strong>Price:</strong>$20.00</p>

<p><strong>Event:</strong>Convention</p>
<p><strong>Date:</strong> Mon, May 2, 2016, 11 am</p>
<p><strong>Price:</strong>$25.00</p>

<p><strong>Event:</strong>Dance</p>
<p><strong>Date:</strong> Mon, May 9, 2016, 11 am</p>
<p><strong>Price:</strong>Free</p>
'''

我使用以下代码解析了只有一个日期的日期，但遇到多个日期时遇到困难（只获得一个日期）。

date_raw = html.find_all('strong',string='Date:')
date = str(date_raw.p.nextSibling).strip()

有没有办法在bs4中执行此操作，或者我应该使用正则表达式。还有其他建议吗？

所需的清单输出：

[＆＃39;星期一，2016年4月25日，上午11点＆＃39;＆＃39;星期一，2016年5月2日，上午11点＆＃39;，＆＃39;星期一，2016年5月9日，11上午＆＃39;]

Answer 1

我可能会迭代每个找到的元素并将其附加到列表中。这样的事情可能（未经测试）：

date_list = []
date_raw = html.find_all('strong',string='Date:')

for d in date_raw:
    date = str(d.p.nextSibling).strip()
    date_list.append(date)

print date_list

Answer 2

新秀的错误......修好了：

for x in range(0,len(date_raw)):
    date_add = date_raw[x].next_sibling.strip()
    date_list.append(date_add)
    print (date_add)

在python中使用相同的html标准提取多个元素

2 个答案: