尝试将排序从默认(最相关)更改为"最新",出错了,我无法保存页面。有什么建议吗?
#start webdriver to open the given product page via chrome browser
driver =webdriver.Chrome()
driver.get('http://www.homedepot.com/p/Husky-41-in-16-Drawer-Tool-Chest-and-Cabinet-Set-HOTC4016B1QES/205080371')
time.sleep(2)
#find the drop down list, then select the newest option and click
m=driver.find_element_by_id("BVRRDisplayContentSelectBVFrameID")
m.find_element_by_xpath("//option[@value='http://homedepot.ugc.bazaarvoice.com/1999m/205080371/reviews.djs?format=embeddedhtml&sort=submissionTime']").click()
time.sleep(2)
#save the search result into the python
html = driver.page_source
file_object = open("samplereview.txt", "a")
file_object.write(str(html))
file_object.close( )
time.sleep(2)
soup=BeautifulSoup(html)
#quit from driver
driver.quit
答案 0 :(得分:1)
您缺少两个关键的特定硒事物:
time.sleep()
- 使用Waits select/option
时使用Select
class - 它提供了一个非常好的抽象以下是修改过的代码(看看它的可读性和简短性):
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.select import Select
driver = webdriver.Chrome()
driver.get('http://www.homedepot.com/p/Husky-41-in-16-Drawer-Tool-Chest-and-Cabinet-Set-HOTC4016B1QES/205080371')
# waiting until reviews are loaded
element = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.ID, 'BVRRDisplayContentSelectBVFrameID'))
)
select = Select(element)
select.select_by_visible_text('Newest')
现在我看到从最新到最旧的评论:
要解析评论,您不必将页面来源传递给BeautifulSoup
进行进一步处理 - selenium
本身在查找元素方面非常有用:
reviews = []
for review in driver.find_elements_by_xpath('//span[@itemprop="review"]'):
name = review.find_element_by_xpath('.//span[@itemprop="name"]').text.strip()
stars = review.find_element_by_xpath('.//span[@itemprop="ratingValue"]').text.strip()
description = review.find_element_by_xpath('.//div[@itemprop="description"]').text.strip()
reviews.append({
'name': name,
'stars': stars,
'description': description
})
print(reviews)
打印:
[
{'description': u'Very durable product. Worth the money. My husband loves it',
'name': u'Excellent product',
'stars': u'5.0'},
{'description': u'I now have all my tools in one well organized box instead of several boxes and have a handy charging station for cordless tools on the top . Money well spent. Solid box!',
'name': u'Great!',
'stars': u'5.0'},
...
]