我正在使用Python
和BeautifulSoup
进行网络抓取。
我需要抓住这个
<li class="review-rating">
<h5 class="review-rating__title">Location:</h5>
<span class="review-rating__score">5</span>
<h5 class="review-rating__title">Value:</h5>
<span class="review-rating__score">3</span>
<h5 class="review-rating__title">Facilities:</h5>
<span class="review-rating__score">4</span>
<h5 class="review-rating__title">Service:</h5>
<span class="review-rating__score">4</span>
<h5 class="review-rating__title">Cleanliness:</h5>
<span class="review-rating__score">5</span>
</li>
我实际上用此代码
删除了这个标记for scores_of_this_customer in tt.select('li.review-rating'):
print(scores_of_this_customer.select('h5.review-rating__title')[0].text +" "+scores_of_this_customer.select('span.review-rating__score')[0].text)
但这仅打印Location: 5
我想要一种方法来使用循环打印所有这些分数。
我知道我可以通过将其编入索引为[1]
,[2]
来打印其他分数...等等但我不想写5个打印语句
PS:
这段代码对我有用。
if tt.select('li.review-rating'):
soup = tt.select('li.review-rating').find("li", {"class", "review-rating"})
keys = soup.findAll("h5", {"class" : "review-rating__title"})
values = soup.findAll("span", {"class" : "review-rating__score"})
for key, value in zip(keys, values):
print(key.text + ": " + value.text)
答案 0 :(得分:0)
我相信可以直接访问它们。试试这个:
import urllib
import bs4
url = "http://yourURL.com"
html = urllib.urlopen(url).read()
soup = bs4.BeautifulSoup(html)
如果您确实只需要此<li class="review-rating">
个结果,则可以取消注释以下部分:
# soup = soup.find("li", {"class", "review-rating"})
然后下一部分应该很好地完成所有键/值组合:
keys = soup.findAll("h5", {"class" : "review-rating__title"})
values = soup.findAll("span", {"class" : "review-rating__score"})
for key, value in zip(keys, values):
print(key.text + ": " + value.text)
此代码适用于OP:
if tt.select('li.review-rating'):
soup = tt.select('li.review-rating').find("li", {"class", "review-rating"})
keys = soup.findAll("h5", {"class" : "review-rating__title"})
values = soup.findAll("span", {"class" : "review-rating__score"})
for key, value in zip(keys, values):
print(key.text + ": " + value.text)