如何使用python中的美丽汤来提取代理费,卧室和浴室的信息。 [这里] [1]是我正在废弃的网页。
<ul class="important-fields">
<li class="">
<span> Agency Fees: </span>
<strong> AED 5000 </strong>
</li>
<li class="">
<span> Bedrooms: </span>
<strong> Studio </strong>
</li>
<li class="">
<span> Bathrooms: </span>
<strong> 1 </strong>
</li>
<li>
</ul>
答案 0 :(得分:2)
>>> from bs4 import BeautifulSoup
>>>
>>> html = '''
... <ul class="important-fields">
... <li class="">
... <span> Agency Fees: </span>
... <strong> AED 5000 </strong>
... </li>
... <li class="">
... <span> Bedrooms: </span>
... <strong> Studio </strong>
... </li>
... <li class="">
... <span> Bathrooms: </span>
... <strong> 1 </strong>
... </li>
... </ul>
... '''
>>>
>>> soup = BeautifulSoup(html)
>>> spans = [x.text.strip() for x in soup.select('ul.important-fields li span')]
>>> strongs = [x.text.strip() for x in soup.select('ul.important-fields li strong')]
>>> spans
[u'Agency Fees:', u'Bedrooms:', u'Bathrooms:']
>>> strongs
[u'AED 5000', u'Studio', u'1']
>>> for name, value in zip(spans, strongs):
... print('{} {}'.format(name, value))
...
Agency Fees: AED 5000
Bedrooms: Studio
Bathrooms: 1
答案 1 :(得分:0)
您可以使用Xpath(http://www.w3schools.com/xpath/)在python中使用lxml库从HTML获取数据,您可以在lxml教程(http://lxml.de/tutorial.html)中找到示例。