无法使用Selenium访问网站https://www.sahibinden.com

时间:2020-06-26 16:30:04

标签: python selenium google-chrome selenium-chromedriver bots

尽管我已经尝试了很多次,并尝试使用用户代理。我什至使用了BeautifulSoup。该网站不允许我进入。

from selenium import webdriver
from selenium.webdriver.chrome.options import Options

opts = Options()
opts.add_argument("user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36")

driver_path = "F:/chromedriver"
browser = webdriver.Chrome(executable_path=driver_path,chrome_options=opts)

browser.get("https://www.sahibinden.com/ilan/emlak-konut-satilik-incesu-garipcede-evli-bag-835829825/detay")

2 个答案:

答案 0 :(得分:0)

删除.exe中的driver_path应该可以。像这样:

driver_path = "path/to/your/chromedriver/without/exe"

根据您的chrome版本下载chromedriver之后,您需要提供不带.exe扩展名的chromedriver的完整路径。

答案 1 :(得分:0)

我将您的代码与一些简单的调整一起使用,如下所示:

  • 代码块:

    from selenium import webdriver
    
    options = webdriver.ChromeOptions() 
    options.add_argument("start-maximized")
    options.add_experimental_option("excludeSwitches", ["enable-automation"])
    options.add_experimental_option('useAutomationExtension', False)
    driver = webdriver.Chrome(options=options, executable_path=r'C:\WebDrivers\chromedriver.exe')
    driver.get('https://www.sahibinden.com/ilan/emlak-konut-satilik-incesu-garipcede-evli-bag-835829825/detay')
    print(driver.page_source)
    
  • 浏览器快照:

sahibinden

  • 控制台输出:

<html>

<head>
  <title></title>
  <style>
      .preloader {
      width: 100%;
      height: 100%;
      position: absolute;
      left: 0;
      right: 0;
      top: 0;
      bottom: 0;
      background-image: url('data:image/gif;base64,R0lGODlhQABAAOMAAAQCBMTCxERGRCQiJOzq7BQSFNTW1GRmZCwuLAwKDMTGxOzu7BQWFHRydDQyNP///yH/C05FVFNDQVBFMi4wAwEAAAAh+QQIBgAAACwAAAAAQABAAAAEbPDJSau9OOvNu/9gKI5kaZ5oqq5s675wLM90bd94ru987//AoHBILBqPyKRyyWw6n9CodEqtWq/YrHbL7Xq/4LB4TN4tDoyEwJAkDABwuALZiMcRSIQ9Tjjq9wB9RnV7eEdue3NIBAcFamwsEQAh
     . 
     .
     . 
     (function(e) {
        e.initCustomEvent("teXnghGbT", false, false, ["A6t9_fFyAQAAM-1AOljwDCGitC0v7vJY1or4qXqaS99_QKcxspE6OkxCHBLxAZ0vMOCucnW8wH8AAEB3AAAAAA==", "vZ1z8n9UIltBkSR7DqW0Fg5LuAcJHoa2fC6iw_bmyPTYMNsQ-Oh3dVGK=EpxXj4er", [], [155272514, 740924909, 1776946185, 932189146, 1328352516, 633076428, 1104746011, 975810292], "jiGC/uEjFnRkm61qNb6PSPK4", "jiGC/uEjFnRkm61qNb6PSPK4", [], typeof arguments==="undefined"?void 0:arguments]);
        dispatchEvent(e)
      }
      (document.createEvent("CustomEvent")))</script><script>(function() {
        'use strict';
        var afterReadyCbCalled=false;
        var originalHeaders=["X-Origin-DC", "gytp", "X-Forwarded-For", "157.47.48.224", "X-Client-SrcPort", "51230", "Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9", "Accept-Language", "en-US,en;q=0.9", "X-Forwarded-Proto", "https", "X-TLS-Version", "771", "Upgrade-Insecure-Requests", "1", ];
        var originalBody="";
        function afterReadyCb() {
          if (afterReadyCbCalled) return;
          afterReadyCbCalled=true;
          var xhr=new XMLHttpRequest();
          xhr.onload=function() {
            var isValid=xhr.getResponseHeader("ISTL-INFINITE-LOOP");
            if (isValid !=null && isValid !='') return;
            var a=xhr.getResponseHeader("ISTL-REDIRECT-TO");
            if (a !=null && a !='') {
              location.replace(a);
            }
            else {
              if (window.history !=null && typeof history.replaceState==='function') {
                var responseURL=xhr.responseURL !=null ? xhr.responseURL: xhr.getResponseHeader("ISTL-RESPONSE-URL");
                if (responseURL !=null && responseURL !='') {
                  history.replaceState(null, '', responseURL);
                }
              }
              document.open();
              document.write(xhr.responseText);
              document.close();
            }
          }
          ;
          xhr.open("get", location.href, true);
          for (var i=0;
          i < originalHeaders.length;
          i +=2) {
            var headerName=originalHeaders[i];
            try {
              xhr.setRequestHeader(headerName, originalHeaders[i + 1]);
            }
            catch (e) {}
          }
          xhr.setRequestHeader("ISTL-INFINITE-LOOP", '1');
          xhr.send(originalBody);
          var evt=document.createEvent('Event');
          evt.initEvent('QLpZFJdHv', true, true);
          dispatchEvent(evt);
        }
        addEventListener('afterReady', afterReadyCb, false);
        setTimeout(afterReadyCb, 400);
      }
      ());
      </script><style>html,
      body {
        margin: 0;
        padding: 0;
        background-color: white;
      }
  </style>


  </body>

</html>


结论

浏览器快照中的消息位于土耳其语中,表示:

我们检测到异常访问...我们看到从您的设备或您所连接的网络异常(自动)访问我们的站点。我们目前无法满足您的要求,您可以稍后重试。

似乎Selenium驱动的WebDriver控制的 Chrome浏览上下文被检测到并且导航被阻止。