HTML
<div class="productDescriptionWrapper">
<p>A worm worth getting your hands dirty over. With over six feet of crawl space, Playhut’s Wiggly Worm is a brightly colored and friendly play structure.
</p>
<ul>
<li>6ft of crawl through fun</li>
<li>18” diameter for easy crawl through</li>
<li>Bright colorful design</li>
<li>Product Measures: 18""Diam x 60""L</li>
<li>Recommended Ages: 3 years & up<br /> </li>
</ul>
<p><strong>Intended for Indoor Use</strong></p>
代码
def GetBullets(self, Soup):
bulletList = []
bullets = str(Soup.findAll('div', {'class': 'productDescriptionWrapper'}))
bullets_re = re.compile('<li>(.*)</li>')
bullets_pat = str(re.findall(bullets_re, bullets))
index = bullets_pat.findall('</li>')
print index
如何提取p
代码和li
代码?谢谢!
答案 0 :(得分:3)
请注意以下事项:
>>> from BeautifulSoup import BeautifulSoup
>>> html = """ <what you have above> """
>>> Soup = BeautifulSoup(html)
>>> bullets = Soup.findAll('div', {'class': 'productDescriptionWrapper'})
>>> ptags = bullets[0].findAll('p')
>>> print ptags
[<p>A worm worth getting your hands dirty over. With over six feet of crawl space, Playhut’s Wiggly Worm is a brightly colored and friendly play structure.
</p>, <p><strong>Intended for Indoor Use</strong></p>]
>>> print ptags[0].text
A worm worth getting your hands dirty over. With over six feet of crawl space, Playhut’s Wiggly Worm is a brightly colored and friendly play structure.
您可以以类似的方式获取li标签的内容。
答案 1 :(得分:0)
我们使用Beautiful Soup。