Question

我正在使用Java JRE 1.8.0_141，我正在尝试访问一个特定的URL并将HTML存储到一个String中，以便我可以在代码中稍后操作数据，但每当我调用getInputStream时我都会收到错误405（）。

代码似乎与其他URL一起使用没有问题。麻烦的URL是：

http://www.streeteasy.com/for-rent/nyc/status:open%7Cprice:1750-2900%7Carea:104,116,119,143,141%7Camenities:pool?page=2&refined_search=true

这是Eclipse 4.6.3的具体错误：

<terminated, exit value: 1>C:\Program Files\Java\jre1.8.0_141\bin\javaw.exe (Aug 6, 2017, 10:53:37 PM)  

Exception in thread "main" java.lang.RuntimeException: java.io.IOException: Server returned HTTP response code: 405 for URL: http://www.streeteasy.com/for-rent/nyc/status:open%7Cprice:1750-2900%7Carea:104,116,119,143,141%7Camenities:pool?page=2&refined_search=true
    at RunMe.getHTMLFromURL(RunMe.java:52)
    at RunMe.main(RunMe.java:18)
Caused by: java.io.IOException: Server returned HTTP response code: 405 for URL: http://www.streeteasy.com/for-rent/nyc/status:open%7Cprice:1750-2900%7Carea:104,116,119,143,141%7Camenities:pool?page=2&refined_search=true
    at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(Unknown Source)
    at sun.net.www.protocol.http.HttpURLConnection.getInputStream(Unknown Source)
    at RunMe.getHTMLFromURL(RunMe.java:36)
    ... 1 more

我的RunMe.java代码如下：

import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.net.URL;
import java.net.URLConnection;
import java.util.LinkedList;

public class RunMe {

public static void main(String[] args) throws IOException {
    // TODO Auto-generated method stub

    System.out.println(getHTMLFromURL("http://www.streeteasy.com/for-rent/nyc/status:open%7Cprice:1750-2900%7Carea:104,116,119,143,141%7Camenities:pool?page=2&refined_search=true"));      
}

public static String getHTMLFromURL(String url){
        try{
            URL urlObj = new URL(url);
            URLConnection con = urlObj.openConnection();
            con.setDoOutput(false);
            con.connect();

            BufferedReader in = new BufferedReader(new InputStreamReader(con.getInputStream())); 
            // CODE FAILS HERE ^

            StringBuilder response = new StringBuilder();
            String inputLine;

            String newLine = System.getProperty("line.separator");
            while ((inputLine = in.readLine()) != null){
                response.append(inputLine + newLine);
            }
            in.close();

            return response.toString();
        }
        catch (Exception e){
            throw new RuntimeException(e);
        }
    }
}

如果不通过此方法，我是否知道如何从此URL中提取HTML？提前谢谢！

Answer 1

我对网址执行了curl命令，看起来该网站正在尝试运行JavaScript来呈现网页。

curl -v -L -H "User-Agent: Mozilla/5.0" -H "Accept: text/html" "http://www.streeteasy.com/for-rent/nyc/status:open%7Cprice:1750-2900%7Carea:104,116,119,143,141%7Camenities:pool?page=2"

> GET /for-rent/nyc/status:open%7Cprice:1750-2900%7Carea:104,116,119,143,141%7Camenities:pool?page=2 HTTP/1.1
> Host: www.streeteasy.com
> User-Agent: Mozilla/5.0
> Accept: text/html
> 
< HTTP/1.1 405 Not Allowed

// elided

<h1>Pardon Our Interruption...</h1>
<p>As you were browsing <strong>www.streeteasy.com</strong> something about your browser made us think you were a bot. There are a few reasons this might happen:</p>
<ul>
    <li>You're a power user moving through this website with super-human speed.</li>
    <li>You've disabled JavaScript in your web browser.</li>
    <li>A third-party browser plugin, such as Ghostery or NoScript, is preventing JavaScript from running. Additional information is available in this <a title='Third party browser plugins that block javascript' href='http://ds.tl/help-third-party-plugins' target='_blank'>support article</a>.</li>
</ul>

<p>After completing the CAPTCHA below, you will immediately regain access to www.streeteasy.com.</p>

除非你能以编程方式填写验证码，否则你可能会失去运气。

修改：

问题显然是cookie，如下面的讨论中所示。

Java（JRE 1.8.0_141） - GET请求的错误405

1 个答案: