Question

首先，我是Java的新手，我的英语很糟糕，所以希望你能理解我的问题。

我想从此网址阅读文字文件：http://www.cophieu68.com/export/metastock.php?id=AAA

好的，让我解释一下。这是一个越南股票数据网站，上面的链接指向文件 aaa.txt ，其中包含代号为AAA的股票信息。我只需修改 id 变量的值即可获取其他股票信息。

我的问题是我得到的是一堆HTML代码，而不是我期望的文本文件（ aaa.txt ）

这是我的代码：

    public static void main(String[] args){
    try {

        URL url = new URL("http://www.cophieu68.com/export/metastock.php?id=AAA");
        URLConnection urlConn = url.openConnection();

        System.out.println(urlConn.getContentType());  //it returns text/html

        BufferedReader in = new BufferedReader
        (new InputStreamReader(urlConn.getInputStream()));

        String text;

        while ((text = in.readLine()) != null) {

            System.out.println(text);
        }

        in.close();
    } catch (MalformedURLException e) {
        e.printStackTrace();
    } catch (IOException e) {
        e.printStackTrace();
    }
}

感谢您的帮助。

Answer 1

该网站似乎是sniffing the user-agent来决定要发送的内容。

如果您spoof the user-agent如下所示，它可以正常工作 - 响应是纯文本文件：

urlConn.setRequestProperty ( "User-agent", "Mozilla/5.0 (X11; U; Linux i686; pl-PL; rv:1.9.0.2) Gecko/20121223 Ubuntu/9.25 (jaunty) Firefox/3.8");

正如您可能知道的那样，这假装用户代理是Ubuntu上的Firefox 3.8。

Answer 2

可能是因为链接（http://www.cophieu68.com/export/metastock.php?id=AAA）作为附件发送。如果您可以访问PHP文件，那么除了打印数据并包括

之外，您应该什么都不做

header('Content-Type: text/plain');

在PHP文件中

Java - 从PHP网页URL读取txt文件

2 个答案: