无法获取网页的源代码

时间:2012-07-30 01:09:53

标签: java winamp

我正在尝试从此站点获取HTML页面源内容:“http://207.200.96.231:8008”使用Java。但是Java的默认库对我没有帮助。我也试过使用这个tutorial,但它也没有用。我认为问题的出现是因为网站的安全保护。当我运行下面提供的以下代码时,我得到一个例外:java.io.IOException: Invalid Http response

有关如何实现代码的任何想法?或者是否有任何库可以满足我的需求?到目前为止,我已尝试JSoupJericho HTML Parser认为他们会使用不同的方法连接到我提供的网站,但它们也无法正常工作。

String urlstr = "http://72.26.204.28:9484/played.html";

try {

    URL url = new URL(urlstr);

    URLConnection urlc = url.openConnection();

    InputStream stream = urlc.getInputStream();
    BufferedInputStream buf = new BufferedInputStream(stream);

    StringBuilder sb = new StringBuilder();

    while ( true){

    int data = buf.read();

    if ( data == -1)
        break;
    else
        sb.append((char)data);
    }

    } catch (MalformedURLException e) {
        e.printStackTrace();
    } catch (IOException e) {
        e.printStackTrace();
}

编辑(已解决问题):在Karai17trashgod的帮助下,我设法解决了这个问题。 Shoutcast页面需要用户代理才能访问其内容。所以我们需要做的就是添加这段代码:

urlConnection.setRequestProperty("User-Agent", "Mozilla/5.0");

最新代码如下:

try {
        URL url = new URL("http://207.200.96.231:8008/7.html");
        HttpURLConnection urlConnection = (HttpURLConnection)url.openConnection();
        urlConnection.setRequestProperty("User-Agent", "Mozilla/5.0");

        InputStream is = urlConnection.getInputStream();
        BufferedInputStream in = new BufferedInputStream(is);
        int c;
        while ((c = in.read()) != -1) {
            System.out.write(c);
        }
        urlConnection.disconnect();
    } catch (MalformedURLException e) {
        e.printStackTrace();
    } catch (IOException e) {
        e.printStackTrace();
}

1 个答案:

答案 0 :(得分:1)

此流似乎需要Winamp

$ curl -v http://207.200.96.231:8008
* About to connect() to 207.200.96.231 port 8008 (#0)
*   Trying 207.200.96.231... connected
* Connected to 207.200.96.231 (207.200.96.231) port 8008 (#0)
It appears to require [Winamp][2].

> GET / HTTP/1.1
> User-Agent: curl/...
> Host: 207.200.96.231:8008
> Accept: */*
> 
ICY 200 OK
icy-notice1:
This stream requires Winamp
icy-notice2:SHOUTcast Distributed Network Audio Server/Linux v1.9.93atdn
icy-name:Absolutely Smooth Jazz - SKY.FM - the world's smoothest jazz 24 hours a day icy-genre:Soft Smooth Jazz icy-url:http://www.sky.fm/smoothjazz/ content-type:audio/mpeg icy-pub:1 icy-br:96 ...

附录:您可以像这样阅读信息流:

URL url = new URL("http://207.200.96.231:8008");
URLConnection con = url.openConnection();
InputStream is = con.getInputStream();
BufferedInputStream in = new BufferedInputStream(is);
int c;
while ((c = in.read()) != -1) {
    System.out.write(c);
}