Question

我试图刮擦（练习）的页面是下面的网址。我试图抓住页面底部的损益表（图表）

import time

from PIL import Image
import time
from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
browser.quit()
browser = webdriver.PhantomJS()
browser.implicitly_wait(12)
url = 'https://seekingalpha.com/symbol/OPK/financials/income-statement'



browser.get(url)
time.sleep(9)
#x =browser.find_element_by_class_name('content')
y =browser.find_element_by_xpath("//*[@id='industrial-income-statement']")

这段代码刚刚开始工作;现在我得到一个＆＃34;没有这样的元素＆＃34;错误，此行y =browser.find_element_by_xpath("//*[@id='industrial-income-statement']")

如果我输入browser.page_source：

访问权被拒绝，但我不确定原因。我只是试图刮掉一张图表，而且我使用的是Selenium，我认为它有合适的标题。

'0px 25px; padding: 0px; resize: none; "></textarea></div></div></div>\n <p>\n Access to this page has been denied because we believe you are using automation tools to browse the website.\n </p>\n <p>\n This may happen as a result of the following:\n </p>\n <ul>\n <li>\n Javascript is disabled or blocked by an extension (ad blockers for example)\n </li>\n <li>\n Your browser does not support cookies\n </li>\n </ul>\n <p>\n Please make sure that Javascript and cookies are enabled on your browser and that you are not blocking them from loading.\n </p>\n <p>\n Reference ID: #a2a7fe90-4a2a-11e7-be16-a994e7f2d3b8\n </p>\n </div>\n </div>\n <div class="page-footer-wrapper">\n <div class="page-foote

据我所知，PhantomJS不会阻止Javascript或阻止cookie。

有解决方法吗？

Answer 1

你应假装不PhantomJS以避免被发现：

capabilities = dict(webdriver.DesiredCapabilities.PHANTOMJS)
capabilities["phantomjs.page.settings.userAgent"] = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.57 Safari/537.36"

browser = webdriver.PhantomJS(desired_capabilities=capabilities)

我会谨慎地在未经明确同意的情况下抓取此资源 - 请查看Terms of Use - ＆＃34;用户行为＆＃34;部分。

由于参数丢失，我被网站封锁了吗？（用硒刮）

1 个答案: