Question

我正在尝试编写一个程序来读取网站http://judgephilosophies.wikispaces.com的html源代码。我写了一些简单的java代码来读取和输出源代码，但它只打印出“null”。不过这是奇怪的事情 - 如果我在代码中用任何其他网站替换“http://judgephilosophies.wikispaces.com”，它就可以了。它似乎只适用于wikispaces.com域名中的网站，该程序不起作用，而且我完全糊涂了原因。代码如下。非常感谢帮助。

import java.io.*;
import java.net.*;

public class AccessWebExample 
{
    public static void main (String[] args) throws Exception
    {
        //Create reader to access html source code
        URL url = new URL ("http://judgephilosophies.wikispaces.com/");
        InputStreamReader isr = new InputStreamReader (url.openStream());
        BufferedReader reader = new BufferedReader (isr);

        //Read and print the text
        do
        { 
            System.out.println(reader.readLine());
        }
        while(reader.readLine() != null);
    }
}

Answer 1

使用Wireshark或其他人进行HTTP跟踪并进行比较。如果裸URLConnection的行为与浏览器不同，则可能是cookie或标题问题。

Answer 2

从命令行中使用wget，您将找到：

broach@broach-laptop:~$ wget http://judgephilosophies.wikispaces.com/
--2011-04-23 14:50:31--  http://judgephilosophies.wikispaces.com/
Resolving judgephilosophies.wikispaces.com... 208.43.192.33, 75.126.104.177
Connecting to judgephilosophies.wikispaces.com|208.43.192.33|:80... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://session.wikispaces.com/1/auth/auth?authToken=e8ad55c0e2701a0e7da89807255609da [following]

它重定向（实际上是多次）。您的裸URLConnection无法处理。响应代码位于标题中，因此您的程序当前打印为空。

您真的应该考虑使用HttpUrlConnection，因为它可以为您处理重定向。要使用URL执行此操作，您需要查看返回的标头并对HTTP响应代码执行操作（这是HttpURLConnection所做的）

使用Java访问网页的奇怪问题

2 个答案: