如何从HTTP网站收集(获取和解析)所需的信息/数据?

时间:2013-06-04 14:36:10

标签: android xml-parsing html-parsing android-parser

我有一个问题,自上两周以来无法解决。我想要一些帮助。我实际上想要从HTTP网站获取并使用一些有用的数据。该网站实际上包含事故,事件和有关它们的所有信息。我想从网站上获取这些信息。我会在我的Android应用中使用它。我已经问过这个问题,但仍然无法解决。有人告诉我你必须从JSON获取这些数据。我以前没有这样做过。如果它是唯一的解决方案,那么我该怎么做呢。如果有任何其他简单的方法那么请给我。我实际上已经使用

获取了所有网站内容
private String DownloadText(String URL) {
    int BUFFER_SIZE = 2000;
    InputStream in = null;
    try {
        in = OpenHttpConnection(URL);
    } catch (IOException e1) {
        // TODO Auto-generated catch block
        e1.printStackTrace();
        return "exception in downloadText";
    }

    InputStreamReader isr = new InputStreamReader(in);
    int charRead;
    String str = "";
    char[] inputBuffer = new char[BUFFER_SIZE];          
    try {
        while ((charRead = isr.read(inputBuffer))>0)
        {                    
            //---convert the chars to a String---
            String readString = String.copyValueOf(inputBuffer, 0, charRead);
            str += readString;
            inputBuffer = new char[BUFFER_SIZE];
        }
        in.close();
    } catch (IOException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
        return "";
    }    
    return str;        
}

private InputStream OpenHttpConnection(String urlString) throws IOException {

    InputStream in = null;
    int response = -1;

    URL url = new URL(urlString); 
    URLConnection conn = url.openConnection();

    if (!(conn instanceof HttpURLConnection))                     
        throw new IOException("Not an HTTP connection");

    try{
        HttpURLConnection httpConn = (HttpURLConnection) conn;
        httpConn.setAllowUserInteraction(false);
        httpConn.setInstanceFollowRedirects(true);
        httpConn.setRequestMethod("GET");
        httpConn.connect(); 

        response = httpConn.getResponseCode();                 
        if (response == HttpURLConnection.HTTP_OK) {
            in = httpConn.getInputStream();                                 
        }                     
    }
    catch (Exception ex) {
        throw new IOException("Error connecting");            
    }
    return in;     
}

但它提供了所有内容,即所有信息+ html + xml +++。但我只想要所需的信息。

另一件事是,在获取数据之前是否必须获得网站管理员权限?

1 个答案:

答案 0 :(得分:1)

您正在寻找的是网络抓取或HTML抓取。 看看这个问题,让你开始: Options for HTML scraping?