Question

我正在尝试编写一个使用Wikipedia API的程序。据我所知，使用API的最简单方法是使用请求的命令访问HTTP页面，例如，this查找“Apple”维基百科文章中的所有链接。我想在我的Java程序中实现这些命令，所以我创建了以下代码片段来从HTTP页面获取数据：

    URLConnection connection = null;    // Connection to the URL data
    InputStreamReader iSR = null;       // Stream of the URL data
    BufferedReader bR = null;           // Reader of URL data
    URL url = null;                     // URL based on the specified link

    // Open the connection to the URL web page
    url = new URL(link);
    connection = url.openConnection();

    // Initialize the Readers
    iSR = new InputStreamReader(connection.getInputStream());
    bR = new BufferedReader(iSR);

    // Fetch all of the lines from the buffered reader and join them all
    // together into a single string.
    return bR.lines().collect(Collectors.joining("\n"));

这适用于获取数据，但速度非常慢。每次获取大约需要半秒钟，这对我的程序来说是不可接受的，特别是因为处理整个下载文件只需要大约1/1000秒。有什么方法可以以某种方式快速下载这些小文件吗？

Answer 1

最快的方式，如果您不介意没有最新的信息（当然您可以设计一种方法来保持数据最新），获得dump数据。

这将允许您创建自己的服务器，该服务器可以返回预先格式化的数据，以及使用单个请求返回多个数据项，这比从多个请求解析HTML要快得多。

从Java获取网站数据的快速方法

1 个答案: