为什么有些网站无法使用Selenium进行自动化

时间:2018-12-13 17:51:25

标签: selenium google-chrome selenium-webdriver webdriver selenium-chromedriver

我尝试自动执行https://www.westernunion.com/global-service/track-transfer网页,但无法弄清为什么网站无法导航到下一页。

我的脚本是 打开页面->输入MTCN为2587051083->单击“继续”按钮 但单击后无任何显示。在复制相同步骤的同时,手动操作效果很好。这些网站是否缺少任何浏览器设置?我很笨

free( arr );

1 个答案:

答案 0 :(得分:0)

https://www.westernunion.com/global-service/track-transfer网页上以在跟踪字段中发送字符序列 我对您自己的代码进行了一些小的修改,以使 WebDriverwait < / em>以使所需的元素可点击,然后在元素上调用click(),文本为 Continue ,如下所示:

  • 代码块:

    import org.openqa.selenium.By;
    import org.openqa.selenium.WebDriver;
    import org.openqa.selenium.chrome.ChromeDriver;
    import org.openqa.selenium.chrome.ChromeOptions;
    import org.openqa.selenium.support.ui.ExpectedConditions;
    import org.openqa.selenium.support.ui.WebDriverWait;
    
    public class westernunion {
    
        public static void main(String[] args) {
    
            System.setProperty("webdriver.chrome.driver","C:\\Utility\\BrowserDrivers\\chromedriver.exe");
            ChromeOptions opt = new ChromeOptions();
            opt.addArguments("start-maximized");
            opt.addArguments("disable-infobars");
            opt.addArguments("--disable-extensions");
            WebDriver driver=new ChromeDriver(opt);
            driver.get("https://www.westernunion.com/global-service/track-transfer");
            new WebDriverWait(driver, 10).until(ExpectedConditions.elementToBeClickable(By.cssSelector("input.new-field.form-control.tt-mtcn.ng-pristine.ng-valid-mask"))).sendKeys("2587051083");
            driver.findElement(By.cssSelector("button.btn.btn-primary.btn-lg.btn-block.background-color-teal.remove-margin#button-track-transfer")).click();
        }
    }
    

似乎click()确实发生了,并且旋转器可见了一段时间,但是搜索被中断,并且在检查网页时,您会发现一些<script>标签和<link>标签中的“ ”指具有关键字 dist css 。例如:

  • <link rel="stylesheet" type="text/css" href="/content/wucom/dist/20181210075630/css/responsive_css.min.css">
  • <script src="/content/wucom/dist/20181210075630/js/js-bumblebee.js"></script>
  • <link ng-if="trackTransferVm.trackTransferData.newTrackTransfer || trackTransferVm.trackTransferData.isRetail" rel="stylesheet" type="text/css" href="/content/wucom/dist/20181210075630/css/main.min.css" class="ng-scope" style="">

明确表明该网站受 Bot Management 服务提供商Distil Networks保护,并且检测到 ChromeDriver 导航并随后被阻止


Distil

根据文章There Really Is Something About Distil.it...

  

Distil通过观察站点行为并识别刮板特有的模式来保护站点免受自动内容抓取机器人的攻击。当Distil在一个站点上识别出一个恶意bot时,它将创建一个列入黑名单的行为配置文件,并将其部署到所有客户。像漫游器防火墙一样,Distil会检测模式并做出反应。

进一步

  

"One pattern with **Selenium** was automating the theft of Web content",Distil首席执行官拉米·埃塞伊(Rami Essai)在上周的一次采访中表示。 "Even though they can create new bots, we figured out a way to identify Selenium the a tool they're using, so we're blocking Selenium no matter how many times they iterate on that bot. We're doing that now with Python and a lot of different technologies. Once we see a pattern emerge from one type of bot, then we work to reverse engineer the technology they use and identify it as malicious".


参考

您可以在以下位置找到一些详细的讨论: