Question

所以我正在抓取这个特定的网页https://www.zomato.com/srijata，用于＃34; Sri＆＃34;发布的所有＆＃34;餐厅评论＆＃34;（不是她自己评论的自我评论）。

zomato_ind = urllib2.urlopen('https://www.zomato.com/srijata')
zomato_info = zomato_ind.read()
open('zomato_info.html', 'w').write(zomato_info)
soup = BeautifulSoup(open('zomato_info.html'))
soup.find('div','mtop0 rev-text').text

这打印了她的第一次餐厅评论，即 - ＆＃34; Sri回顾了Big Straw - Chew On This＆＃34; as： -

 u'Rated&nbsp;&nbsp;This is situated right in the heart of the city. The items on the menu are alright and I really had to compromise for bubble tea. The tapioca was not fresh. But the latte and the soda pop my friends tried was good. Another issue which I faced was mosquitos... They almost had me.. Lol..'

我还尝试了另一个选择器： -

我有这样的问题： -

如何打印下一次餐厅评论？我尝试了findNextSiblings等，但似乎都没有。

Answer 1

首先，您不需要将输出写入文件，将urlopen()调用的结果传递给BeautifulSoup构造函数。

要获得评论，您需要使用类div迭代所有rev-text标记，并获取div元素的.next_sibling：

import urllib2
from bs4 import BeautifulSoup

soup = BeautifulSoup(urllib2.urlopen('https://www.zomato.com/srijata'))
for div in soup.find_all('div', class_='rev-text'):
    print div.div.next_sibling

打印：

This is situated right in the heart of the city. The items on the menu are alright and I really had to compromise for bubble tea. The tapioca was not fresh. But the latte and the soda pop my friends tried was good. Another issue which I faced was mosquitos... They almost had me.. Lol..

The ambience is good. The food quality is good. I Didn't find anything to complain. I wanted to visit the place fir a very long time and had dinner today. The meals are very good and if u want the better quality compared to other Andhra restaurants then this is the place. It's far better than nandhana. The staffs are very polite too. 

...

Answer 2

你应该创建一个for循环并使用find_all而不是find：

zomato_ind = urllib2.urlopen('https://www.zomato.com/srijata')
zomato_info = zomato_ind.read()
open('zomato_info.html', 'w').write(zomato_info)
soup = BeautifulSoup(open('zomato_info.html'))
for div in soup.find_all('div','rev-text'):
    print div.text

还有一个问题：为什么要将html保存在文件中，然后将文件读入汤对象？

打印网页的某些文档元素的所有出现

2 个答案: