该网站未在chromedriver / geckodriver(硒)中显示信息

时间:2018-11-18 22:39:49

标签: java selenium selenium-webdriver web-scraping selenium-chromedriver

我一直在尝试废弃https://wizzair.com/en-gb/flights/timetable#/。 进行了一段时间了。但是从今天起,由于Wizz服务器未返回任何信息,我无法获得航班信息,并显示“发生了错误。请重试。如果错误仍然存​​在,请与航空公司联系。”错误。

screen here

对于geckodriver和chrome的浏览器,我尝试不从Selenium而是从.exe手动访问该网站。 因此,似乎该网站知道它是用于自动管理的工具,并且不会返回任何信息。

您对如何解决有任何建议吗?

谢谢

更新: 可以在以下位置找到使用和不使用WebDriver访问的已保存网页: https://drive.google.com/drive/folders/1OsqfKqKyqpOLBMdUbunYH7GUQZgqtRXJ?usp=sharing

代码试用

    System.setProperty("webdriver.gecko.driver", ".\\resources\\drivers\\geckodriver.exe");
    System.setProperty(FirefoxDriver.SystemProperty.DRIVER_USE_MARIONETTE, "true");
    System.setProperty(FirefoxDriver.SystemProperty.BROWSER_LOGFILE, "/dev/null");

    WebDriver driver = new FirefoxDriver();
    driver.manage().window().setPosition(new Point(2000, 0)); // move window to the second display
    driver.manage().window().maximize();
    driver.get("https://wizzair.com/en-gb/flights/timetable/clujnapoca/vienna--#/1/0/1/0/0/2019-01/2019-01");

1 个答案:

答案 0 :(得分:0)

您可以尝试更改HTTP请求中的标头,或更改某些系统属性以隐藏您正在使用硒浏览器的事实。

设置系统属性:

System.setProperty(property, value);

要查看您的JVM支持哪些属性,请打开一个控制台(CMD /终端)并输入:

java -XshowSettings:all

设置系统属性示例:

static {
    System.setProperty("user.dir", "C:\\Users\\YourName");
}

系统属性参考: https://docs.oracle.com/javase/tutorial/essential/environment/sysprop.html


建立与网站的连接:

注意: URLConnection#addRequestProperty是设置标题以发送到网站的方式。您可以使用HTTPHTTPS,它们与URLConnection的常规软件包相同或非常接近。

private static final String LINUX_USER_AGENT =
        "Mozilla/5.0 (X11; Linux x86_64; rv:52) Gecko/20100101 Firefox/62" + ".0";
private static final String WINDOWS_USER_AGENT =
        "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:62.0) " + "Gecko/20100101 Firefox/62.0";

/**
 * Masks the URL connection as a regular one would be 403 forbidden
 *
 * @param url
 *      - URL to mask and connect to
 * @return the masked url connection to the website
 *
 * @throws IOException
 */
public static InputStreamReader getMaskedInputStream(String url) throws IOException
{
    URL website = new URL(url);
    URLConnection connection = website.openConnection();
    if (System.getProperty("os.name").contains("Win"))
    {
        connection.addRequestProperty("User-Agent", WINDOWS_USER_AGENT);
    }
    else
    {
        connection.addRequestProperty("User-Agent", LINUX_USER_AGENT);
    }
    connection.addRequestProperty("Accept-Language", "en-US,en;q=0.5");
    connection.addRequestProperty("Accept-Encoding", "gzip, deflate");
    return new InputStreamReader(connection.getInputStream());
}

/**
 * Masks the URL connection as a regular one would be 403 forbidden
 *
 * @param url
 *      - URL to mask and connect to
 * @return the masked url connection to the website
 *
 * @throws IOException
 */
public static InputStreamReader getMaskedInputStream(Proxy proxy, Authenticator auth, String url) throws IOException
{
    Authenticator.setDefault(auth);
    final URL website = new URL(url);
    final URLConnection connection = website.openConnection(proxy);
    if (System.getProperty("os.name").contains("Win"))
    {
        connection.addRequestProperty("User-Agent", WINDOWS_USER_AGENT);
    }
    else
    {
        connection.addRequestProperty("User-Agent", LINUX_USER_AGENT);
    }
    connection.addRequestProperty("Accept-Language", "en-US,en;q=0.5");
    connection.addRequestProperty("Accept-Encoding", "gzip, deflate");
    return new InputStreamReader(connection.getInputStream());
}