打印<li> - BeautifulSoup </li>中的所有元素

时间:2014-11-09 09:51:03

标签: python python-3.x beautifulsoup

我正在使用PythonBeautifulSoup进行网络抓取。

我需要抓住这个

<li class="review-rating">
   <h5 class="review-rating__title">Location:</h5>
   <span class="review-rating__score">5</span>
   <h5 class="review-rating__title">Value:</h5>
   <span class="review-rating__score">3</span>
   <h5 class="review-rating__title">Facilities:</h5>
   <span class="review-rating__score">4</span>
   <h5 class="review-rating__title">Service:</h5>
   <span class="review-rating__score">4</span>
   <h5 class="review-rating__title">Cleanliness:</h5>
   <span class="review-rating__score">5</span>
</li>

我实际上用此代码

删除了这个标记
for scores_of_this_customer in tt.select('li.review-rating'):
   print(scores_of_this_customer.select('h5.review-rating__title')[0].text +" "+scores_of_this_customer.select('span.review-rating__score')[0].text)

但这仅打印Location: 5

我想要一种方法来使用循环打印所有这些分数。

我知道我可以通过将其编入索引为[1][2]来打印其他分数...等等但我不想写5个打印语句

PS:

这段代码对我有用。

if tt.select('li.review-rating'):
      soup = tt.select('li.review-rating').find("li", {"class", "review-rating"})
      keys = soup.findAll("h5", {"class" : "review-rating__title"})
      values = soup.findAll("span", {"class" : "review-rating__score"})
      for key, value in zip(keys, values):
       print(key.text + ": " + value.text)

1 个答案:

答案 0 :(得分:0)

我相信可以直接访问它们。试试这个:

import urllib
import bs4
url = "http://yourURL.com"
html = urllib.urlopen(url).read()
soup = bs4.BeautifulSoup(html)

如果您确实只需要此<li class="review-rating">个结果,则可以取消注释以下部分:

# soup = soup.find("li", {"class", "review-rating"})

然后下一部分应该很好地完成所有键/值组合:

keys = soup.findAll("h5", {"class" : "review-rating__title"})
values = soup.findAll("span", {"class" : "review-rating__score"})
for key, value in zip(keys, values):
    print(key.text + ": " + value.text)

此代码适用于OP:

if tt.select('li.review-rating'):
      soup = tt.select('li.review-rating').find("li", {"class", "review-rating"})
      keys = soup.findAll("h5", {"class" : "review-rating__title"})
      values = soup.findAll("span", {"class" : "review-rating__score"})
      for key, value in zip(keys, values):
       print(key.text + ": " + value.text)