首次打开Bloomberg网址后,就会检测到硒并立即弹出验证码

时间:2018-08-17 09:55:54

标签: python selenium google-chrome selenium-chromedriver captcha

就像标题中所说的那样,我在抓取网站时遇到问题,特别是bloomberg.com。我应该打开这样的链接:

from selenium import webdriver
driver = webdriver.Chrome(path_to_driver)
driver.get("https://www.bloomberg.com/research/stocks/private/snapshot.asp?privcapId=4253471")


但是我立即收到警告,并且在打开的第二个链接中弹出验证码。我没有用其他请求或其他任何内容充斥网站,我所做的只是每10秒左右拨打一次driver.get()

到目前为止,我已经尝试过:从这里link to a similar question。我了解到您应该在HEX编辑器中修改chromedriver.exe,并用“ xyzw”之类的东西替换“ $ cdc”,但是这样做没有任何改变(当我打开/关闭路由器时,我得到了不同的IP,所以我绝对没有IP被阻止)。

任何想法都可以在这里做什么?到目前为止,我之前从未遇到过这样的事情,在第一个链接上被阻止了。

1 个答案:

答案 0 :(得分:0)

有关您确切想从website中删除的内容的更多详细信息,将有助于我们更好地调试问题。

但是,要抓取两个(两个)关键开发,您可以使用以下解决方案:

  • 代码块:

    from selenium import webdriver
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    
    options = webdriver.ChromeOptions()
    options.add_argument('start-maximized')
    options.add_argument('disable-infobars')
    options.add_argument('--disable-extensions')
    driver = webdriver.Chrome(chrome_options=options, executable_path=r'C:\WebDrivers\chromedriver.exe')
    driver.get('https://www.bloomberg.com/research/stocks/private/snapshot.asp?privcapId=4253471')
    for item in WebDriverWait(driver, 10).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "div.newsItem p"))):
        print(item.get_attribute("innerHTML"))
    driver.quit()
    
  • 控制台输出:

    CARDONE Industries has named Michael Cardone, III as the Executive Chairman of its Board of Directors. The company is also pleased to announce the addition of Dena Moore and Bill Strahan as new Board members. Michael Cardone, III is an owner of CARDONE Industries and serves on the company's Board of Directors. He has also served in Executive leadership roles with CARDONE, including President, since 1998. As Executive Chairman, he will focus on CARDONE's long-term growth strategies, including acquisition activity and the company's footprint and real estate holdings. He will also be responsible for managing the Board of Directors and its processes. Dena Moore spent 20 years as a senior merger and acquisition investment banker and as Chief Operating Officer for Harris Williams & Co., now a subsidiary of PNC Financial Services Group. Today, as the founder of DFM Advisory, LLC, she works primarily with entrepreneurs to provide strategic and operational consulting services. Bill Strahan is Executive Vice President of Human Resources for Comcast Cable.
    CARDONE Industries, Inc. announced plans to build a new, state-of-the-art distribution center in Harlingen, TX, near the company’s current core processing facilities at 5810 Harrison Avenue. Construction of the new facility is expected to begin in January 2018, and to be finished by December 2018. The new distribution center is intended to support growing production at CARDONE’s manufacturing facilities, and the building will be constructed with the capacity for future expansion, as needed. CARDONE expects the new distribution center to create hundreds of new jobs in the Harlingen area. Along with its facilities in Philadelphia, Texas, Los Angeles, Canada and Mexico, CARDONE added operations in Vancouver, Phoenix, Seattle, Toronto, Spain and China through its recent acquisition of ADP Distributors and Rotomaster on November 20, 2017.