使用硒与webscraping时出错

时间:2017-09-20 06:27:42

标签: python selenium selenium-webdriver web-scraping phantomjs

我正在从Google Play主页上抓取评论。

我出来很好,在路上停下来。 我收到以下错误:

selenium.common.exceptions.ElementNotVisibleException: Message: {"errorMessage":"Element is not currently visible and may not be manipulated","request":{"headers":{"Accept":"application/json","Accept-Encoding":"identity","Connection":"close","Content-Length":"81","Content-Type":"application/json;charset=UTF-8","Host":"127.0.0.1:51812","User-Agent":"Python http auth"},"httpVersion":"1.1","method":"POST","post":"{\"id\": \":wdc:1505887578605\", \"sessionId\": \"cca93cc0-9dc9-11e7-a685-bd84ddef3ed2\"}","url":"/click","urlParsed":{"anchor":"","query":"","file":"click","directory":"/","path":"/click","relative":"/click","port":"","host":"","password":"","user":"","userInfo":"","authority":"","protocol":"","source":"/click","queryKey":{},"chunks":["click"]},"urlOriginal":"/session/cca93cc0-9dc9-11e7-a685-bd84ddef3ed2/element/:wdc:1505887578605/click"}}
Screenshot: available via screen

我做了一次搜索和许多修改,但我无法修复它。 我已经停了好几天了。我会上传我的代码。

为什么会出现此错误? 我该如何解决?

#python 3.6 

from selenium import webdriver
from time import sleep
from bs4 import BeautifulSoup, Comment
import pandas as pd


#Setting up Chrome webdriver Options
#chrome_options = webdriver.ChromeOptions()

#setting  up local path of chrome binary file
#chrome_options.binary_location = 
"/Users/Norefly/chromedriver2/chromedriver.exec"

#creating Chrome webdriver instance with the set chrome_options
driver = webdriver.PhantomJS("C:/Python/phantomjs-2.1.1-
windows/bin/phantomjs.exe")
link = "https://play.google.com/store/apps/details?
id=com.supercell.clashofclans&hl=en"
driver.get(link)
#driver.execute_script("window.scrollTo(0, document.body.scrollHeight)")
Ptitle = driver.find_element_by_class_name('id-app-title').text.replace(' ','')
print(Ptitle)
#driver.find_element_by_xpath('//*[@id="body-content"]/div/div/div[1]/div[2]/div[2]/div[1]/div[4]/button[2]/div[2]').click()

sleep(1)
driver.find_element_by_xpath('//*[@id="body-content"]/div/div/div[1]/div[2]/div[2]/div[1]/div[4]/button[2]/div[2]/div/div').
click()
#select_newest.select_by_visible_text('Newest')
#driver.find_element_by_xpath('//*[@id="body-content"]/div/div/div[1]/div[2]/div[2]/div[1]/div[4]/button[2]/div[2]/div/div').
click()
sleep(2)
#driver.find_element_by_css_selector('.review-filter.id-review-sort-filter.dropdown-menu-container').click()
driver.find_element_by_css_selector('.displayed-child').click()
#driver.find_element_by_xpath("//button[@data-dropdown-value='1']").click()
driver.execute_script("document.querySelectorAll('button.dropdown-child')[0].click()")
reviews_df = []
for i in range(1,10000):
     try:
        for elem in driver.find_elements_by_class_name('single-review'):
            print(str(i))
            content = elem.get_attribute('outerHTML')
            soup = BeautifulSoup(content, "html.parser")
            #print(soup.prettify())
            date = soup.find('span',class_='review-date').get_text()
            rating = soup.find('div',class_='tiny-star')['aria-label'][6:7]
            title = soup.find('span',class_='review-title').get_text()
            txt = soup.find('div',class_='review-body').get_text().replace('Full Review','')[len(title)+1:]
            print(soup.get_text())
            temp = pd.DataFrame({'Date':date,'Rating':rating,'Review Title':title,'Review Text':txt},index=[0])
            print('-'*10)
            reviews_df.append(temp)
            #print(elem)

    except:
        print('s')
    driver.find_element_by_xpath('//*[@id="body-content"]/div/div/div[1]/div[2]/div[2]/div[1]/div[4]/button[2]/div[2]/div/div').
click()
reviews_df = pd.concat(reviews_df,ignore_index=True)

reviews_df.to_csv(Ptitle+'review_google.csv', encoding='utf-8')

#driver.close()

因为我不知道,所以我提出错误的短语和路径。

Traceback (most recent call last):
  File "C:/Users/lobyp/Downloads/reviewex.py", line 51, in <module>
    driver.find_element_by_xpath('//*[@id="body-content"]/div/div/div[1]/div[2]/div[2]/div[1]/div[4]/button[2]/div[2]/div/div').
click()
  File "C:\Users\lobyp\AppData\Local\Programs\Python\Python36\lib\site-packages\selenium\webdriver\remote\webelement.py", line 78, in click
self._execute(Command.CLICK_ELEMENT)
  File "C:\Users\lobyp\AppData\Local\Programs\Python\Python36\lib\site-packages\selenium\webdriver\remote\webelement.py", line 499, in _execute
return self._parent.execute(command, params)
  File "C:\Users\lobyp\AppData\Local\Programs\Python\Python36\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 297, in execute
self.error_handler.check_response(response)
  File "C:\Users\lobyp\AppData\Local\Programs\Python\Python36\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 194, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.ElementNotVisibleException: Message: {"errorMessage":"Element is not currently visible and may not be manipulated","request":{"headers":{"Accept":"application/json","Accept-Encoding":"identity","Connection":"close","Content-Length":"81","Content-Type":"application/json;charset=UTF-8","Host":"127.0.0.1:53283","User-Agent":"Python http auth"},"httpVersion":"1.1","method":"POST","post":"{\"id\": \":wdc:1505888277040\", \"sessionId\": \"6b69b0a0-9dcb-11e7-bc02-87f5b92766da\"}","url":"/click","urlParsed":{"anchor":"","query":"","file":"click","directory":"/","path":"/click","relative":"/click","port":"","host":"","password":"","user":"","userInfo":"","authority":"","protocol":"","source":"/click","queryKey":{},"chunks":["click"]},"urlOriginal":"/session/6b69b0a0-9dcb-11e7-bc02-87f5b92766da/element/:wdc:1505888277040/click"}}
Screenshot: available via screen

1 个答案:

答案 0 :(得分:0)

我不是专家,但我相信如果某个元素具有隐藏的HTML属性,则会出现此异常。我在网站的下拉菜单中遇到了同样的问题,我不得不在Selenium中使用滚动事件来使元素可见。