我试图访问p标签之间的餐馆列表。
<p class="openclosemonth" id="May2014">May, 2014</p>
<p>
<strong>CLOSED:</strong><br />
-- Haveli, Cambridge (Inman Square), MA<br />
-- Ma Soba, Boston (Beacon Hill), MA<br />
-- Milestone, Wellesley, MA<br />
-- Scosso, Peabody, MA<br />
-- Sonny Noto's, East Boston, MA<br />
-- Viva Mexican Grill, Wayland, MA<br />
</p>
<p>
<strong>OPEN:</strong><br />
-- The Abbey, Cambridge, MA<br />
-- The Bancroft, Burlington, MA<br />
-- Beantown Pho and Grill, Boston (Back Bay), MA<br />
-- The Briar Rose, Hyde Park, MA<br />
-- Caffe Nero, Boston, MA<br />
-- Cheeburger Cheeburger, Swampscott, MA<br />
</p>
有关如何提取所需数据的任何建议?
谢谢!
答案 0 :(得分:1)
从<p>
标记获取所有文本,删除空格,跳过空白,然后跳过第一个:
for para in soup.find_all('p'):
if para.strong is not None:
print para.strong.get_text()
lines = filter(None, (t.strip() for t in para.find_all(text=True)))[1:]
print '\n'.join(lines)
print
我为<strong>
子标记添加了一个测试,以便只选择那些特定的段落。
对于您的输入,它给出了:
>>> for para in soup.find_all('p'):
... if para.strong is not None:
... print para.strong.get_text()
... lines = filter(None, (t.strip() for t in para.find_all(text=True)))[1:]
... print '\n'.join(lines)
... print
...
CLOSED:
-- Haveli, Cambridge (Inman Square), MA
-- Ma Soba, Boston (Beacon Hill), MA
-- Milestone, Wellesley, MA
-- Scosso, Peabody, MA
-- Sonny Noto's, East Boston, MA
-- Viva Mexican Grill, Wayland, MA
OPEN:
-- The Abbey, Cambridge, MA
-- The Bancroft, Burlington, MA
-- Beantown Pho and Grill, Boston (Back Bay), MA
-- The Briar Rose, Hyde Park, MA
-- Caffe Nero, Boston, MA
-- Cheeburger Cheeburger, Swampscott, MA
答案 1 :(得分:0)
使用简单方法的Python3兼容版本。
for para in soup.find_all('p'):
if para.strong is not None:
for t in para.find_all(text=True):
print (t.strip())