我是Selenium和Python的新手。我试图在不同酒店的页面上进行分页,以刮除评论者的姓名和评论的等级。我写了以下脚本,它只能在一页上工作,但是当我添加用于分页的代码时,它会中断,我不确定这可能是问题所在。预先感谢。
driver = webdriver.Chrome(chromedriver)
driver.get("https://www.tripadvisor.com/Hotels-g60763-New_York_City_New_York-Hotels.html")
driver.maximize_window()
driver.implicitly_wait(10)
soup = BeautifulSoup(driver.page_source, 'html.parser')
headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.90 Safari/537.36'}
list_rating = []
list_users = []
domain = "https://www.tripadvisor.com/"
list_urls = [domain + i.attrs['href'] for i in soup.findAll('a',class_="review_count")]
for i in list_urls:
find_numbers = re.findall(r'[0-9]+', i)
find_name_hotel = re.findall('Reviews-.*', i)
for u in range(0,10,5):
url = i[:50] + find_numbers[1] + '-Reviews-' + 'or' + str(u) + find_name_hotel[0][7:]
driver.get(url)
time.sleep(5)
element_list = driver.find_elements_by_xpath("//span[@class='taLnk ulBlueLinks']")
for e in element_list:
try:
e.click()
except:
pass
# The code above works, but when I add the code below it breaks
html = driver.page_source
response = requests.get(url, headers=headers ,verify=False).text
soup = BeautifulSoup(driver.page_source, 'html.parser')
for r in soup.find_all('div', 'reviewSelector'):
rating = int(r.find('span','ui_bubble_rating')['class'][1].split('_')[1])/10
list_rating.append(rating)
users = driver.find_elements_by_xpath("//a[@class='ui_header_link social-member-event-MemberEventOnObjectBlock__member--35-jC']")
for i in users:
list_users.append(i.text)
print(list_rating)
print(list_users)
这是我得到的错误。
<ipython-input-5-39d08f981b2e> in <module>
9 find_name_hotel = re.findall('Reviews-.*', i)
10 for u in range(0,10,5):
---> 11 url = i[:50] + find_numbers[1] + '-Reviews-' + 'or' + str(u) + find_name_hotel[0][7:]
12
13
TypeError: 'WebElement' object is not subscriptable
答案 0 :(得分:4)
您将在此代码块中覆盖变量i
:
for i in users:
list_users.append(i.text)
通过为变量使用专有名称而不是i
,可以避免此类错误:
for user in users:
list_users.append(user.text)