Question

我试图制作下载网页的方法。首先，我创建一个HttpURLConnection。其次，我调用connect（）方法。第三，我通过BufferedReader读取数据。

问题在于，对于某些页面，我获得了合理的阅读时间，但是对于某些页面，它非常慢（可能需要大约10分钟！）。慢页面总是相同的，它们来自同一个网站。使用浏览器打开这些页面只需几秒钟而不是10分钟。这是代码

static private String getWebPage(PageNode pagenode)
{
    String result;
    String inputLine;
    URI url;
    int cicliLettura=0;
    long startTime=0, endTime, openConnTime=0,connTime=0, readTime=0;
    try
    {
        if(Core.logGetWebPage())
            startTime=System.nanoTime();
        result="";
        url=pagenode.getUri();
        if(Core.logGetWebPage())
            openConnTime=System.nanoTime();
        HttpURLConnection yc = (HttpURLConnection) url.toURL().openConnection();
        if(url.toURL().getProtocol().equalsIgnoreCase("https"))
            yc=(HttpsURLConnection)yc;
        yc.addRequestProperty("User-Agent", "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-GB;     rv:1.9.2.13) Gecko/20101203 Firefox/3.6.13 (.NET CLR 3.5.30729)"); 
        yc.connect();
        if(Core.logGetWebPage())
            connTime=System.nanoTime();
        BufferedReader in = new BufferedReader(new InputStreamReader(yc.getInputStream()));

        while ((inputLine = in.readLine()) != null)
        {
            result=result+inputLine+"\n";
            cicliLettura++;
        }
        if(Core.logGetWebPage())
            readTime=System.nanoTime();
        in.close();
        yc.disconnect();
        if(Core.logGetWebPage())
        {
            endTime=System.nanoTime();
            System.out.println(/*result+*/"getWebPage eseguito in "+(endTime-startTime)/1000000+" ms. Size: "+result.length()+" Response Code="+yc.getResponseCode()+" Protocollo="+url.toURL().getProtocol()+" openConnTime: "+(openConnTime-startTime)/1000000+" connTime:"+(connTime-openConnTime)/1000000+" readTime:"+(readTime-connTime)/1000000+" cicliLettura="+cicliLettura);
        }
        return result;
    }catch(IOException e){
        System.out.println("Eccezione: "+e.toString());
        e.printStackTrace();  
        return null;
    }
}

这里有两个日志样本其中一个正常＆＃34;网页 getWebPage执行大小：48261响应代码= 200协议= http openConnTime：0 connTime：1 readTime：569 cicliLettura = 359

其中一个＆＃34;慢＆＃34;页面http://ricette.giallozafferano.it/Pan-di-spagna-al-cacao.html/allcomments 看起来像这样 getWebPage执行大小：1748261响应代码= 200协议= http openConnTime：0 connTime：1 readTime：596834 cicliLettura = 35685

Answer 1

您在此处看到的内容是您整理result的方式的结果。请记住，Java中的String是不可变的 - 因此当发生字符串连接时，必须实例化新的String，这通常涉及复制String中包含的所有数据。您为每一行执行以下代码：

result=result+inputLine+"\n";

在幕后，这一行涉及：

创建了一个新的StringBuffer，其内容为result到目前为止
inputLine附加到StringBuffer
StringBuffer转换为String
为StringBuffer

String

新行字符会附加到StringBuffer
StringBuffer转换为String
String存储为result。

随着result变得越来越大，此操作将变得越来越耗时 - 并且您的结果似乎显示（尽管来自2的样本！）结果随着页面大小而显着增加。

相反，请直接使用StringBuffer。

StringBuffer buffer = new StringBuffer();
while ((inputLine = in.readLine()) != null)
{
    buffer.append(inputLine).append('\n');
    cicliLettura++;
}
String result = buffer.toString();

使用HttpURLConnection缓慢下载

1 个答案: