更快地解析数据和填充数组的方法?

时间:2012-03-29 18:39:29

标签: android html regex parsing

我用来解析HTMl的代码如下所示,第二个代码是我如何调用它来填充一个简单列表的数组。

我遇到的问题是下载,解析和显示数据需要5或6秒钟,这太长了。

有什么方法可以加快这个过程,使其尽可能快地接近

此外,正如我所知,我将硬编码到第二位代码中,一旦完成,将传入,取决于路线,方向和停止使用。

public ArrayList<String> getStops(String URL) {
    ArrayList<String> BusStop = new ArrayList<String>();
    String HTML = DownloadText(URL);
    String temp = null;
    String temp2[] = new String[40];
    Pattern p = Pattern.compile("<a class=\"ada\".*</a>", Pattern.DOTALL);

    Matcher m = p.matcher(HTML);
    while (m.find()) {
        temp = m.group();
        temp2 = temp.split("<br></td>");
    }

    for (int i = 0; i < temp2.length; i++) {
        temp = temp2[i];
        temp = temp.replaceAll("<a class=\"ada\" title=\"", "");
        temp = temp.replaceAll("\".*\"", "");
        temp = temp.replaceAll("\n", "");
        temp = temp.replaceAll("\t", "");
        temp = temp.replaceAll(",</a>", "");
        temp = temp.replaceAll("</tr>.*>", "");
        temp = temp.replaceAll("<td.*>", "");
        temp = temp.replaceAll(">.*", "");
        BusStop.add(temp);
    }

    return BusStop;
}

...

TransitXMLExtractor extractor;
static String baseURL5 = "http://www.ltconline.ca/webwatch/ada.aspx?r=1&d=2";

/** Populates string array with bus routes */
public String[] busStopArray() {
    extractor = new TransitXMLExtractor();
    String[] busStopArray = new String[31];

    for (int n = 0; n < busStopArray.length; n++) {
        busStopArray[n] = extractor.getStops(baseURL5).get(n);
    }
    return busStopArray;

}

1 个答案:

答案 0 :(得分:0)

看起来你可以通过使用正则表达式提取所需的确切文本并减少解析循环来加快速度。

public ArrayList<String> getStops(String URL) {
    ArrayList<String> BusStop = new ArrayList<String>();
    String HTML = DownloadText(URL);
    Pattern p = Pattern.compile("<a class=\"ada\" title=\"([\\w\\s]+)\"");

    Matcher m = p.matcher(HTML);
    while (m.find()) {
        BusStop.add(m.group(1));
    }

    return BusStop;
}

此外,调用位可能只是:

public String[] busStopArray() {
    extractor = new TransitXMLExtractor();

    return extractor.getStops(baseURL5).toArray(new String[0]);
}

我现在的方式是,它应该从“ada”类的每个链接中拉出title属性中的文本。

编辑:要清楚,它应该实际上是<a class="ada" title="(whatever)"一个,group(1)为您提供(whatever)文字。

编辑2:我更新了示例以匹配我发现的工作代码。此外,这是我用于测试的整个活动:

package com.kiswa.test;

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.UnsupportedEncodingException;
import java.net.URL;
import java.util.ArrayList;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

import android.app.Activity;
import android.os.Bundle;
import android.util.Log;

public class TestActivity extends Activity {
    /** Called when the activity is first created. */
    @Override
    public void onCreate(Bundle savedInstanceState) {
        super.onCreate(savedInstanceState);

        StringBuilder sb = new StringBuilder();
        for (String stop : busStopArray()) {
            sb.append(stop);
        }
        Log.d("STRING_TEST", sb.toString());

        setContentView(R.layout.main);
    }

    public String DownloadText() throws UnsupportedEncodingException, IOException {
        Log.d("STRING_TEST", "In DownloadText");
        URL url = new URL("http://www.ltconline.ca/webwatch/ada.aspx?r=1&d=2");
        BufferedReader reader = null;
        StringBuilder builder = new StringBuilder();
        try {
            reader = new BufferedReader(new InputStreamReader(url.openStream(), "UTF-8"));
            for (String line; (line = reader.readLine()) != null;) {
                builder.append(line.trim());
            }
        } finally {
            if (reader != null) try { reader.close(); } catch (IOException logOrIgnore) {}
        }

        return builder.toString();
    }

    public ArrayList<String> getStops() {
        Log.d("STRING_TEST", "In getStops");
        ArrayList<String> BusStop = new ArrayList<String>();
        String HTML = "";
        try {
            HTML = DownloadText();
        } catch (UnsupportedEncodingException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
        Pattern p = Pattern.compile("<a class=\"ada\" title=\"([\\w\\s]+)\"");

        Matcher m = p.matcher(HTML);
        while (m.find()) {
            BusStop.add(m.group(1));
        }

        return BusStop;
    }

    public String[] busStopArray() {
        Log.d("STRING_TEST", "In busStopArray");
        return getStops().toArray(new String[0]);
    }
}