使用Pattern从网页中提取数据

时间:2014-04-25 13:43:32

标签: android web-scraping

我想抓一个webpage。我创建了这段代码来获得得分div的值:

public String GetRatingAndVotesFromURL(String url){

        ByteArrayOutputStream outputDoc = null;
        String page = "";
        String rating = "", votes = "";
        String rating_helper="", votes_helper="";

        try
        {
         InputStream is = new URL(url).openStream();
         outputDoc = new ByteArrayOutputStream();
         byte buf[]=new byte[1024];
         int len;
         while((len=is.read(buf))>0)
         {
          outputDoc.write(buf,0, len);
         }


            page = new String(outputDoc.toByteArray(), "UTF-8");
            int start = page.indexOf("<div class=\"score\">")+18; //77956
            int finish = start+5; //77966
            rating = page.substring(start+1, finish).toString();

            for (int i=0;i<5;i++){
                if ( String.valueOf(rating.charAt(i)).equals("<")) break;
                rating_helper += String.valueOf(rating.charAt(i));

            }

}
        catch(Exception e) {  e.printStackTrace(); }



        return rating_helper;

    }

这很好用,但找到代码的一部分是一种奇怪的方法。

所以我改变了

int start = page.indexOf("<div class=\"score\">")+18; //77956
int finish = start+5; //77966
rating = page.substring(start+1, finish).toString();

for (int i=0;i<5;i++){
    if ( String.valueOf(rating.charAt(i)).equals("<")) break;
    rating_helper += String.valueOf(rating.charAt(i));
}

Pattern p = Pattern.compile("<div class=\"score\">([0-9,]+)</div>");
Matcher m = p.matcher(page);
if(m.matches()) {
    rating_helper = m.group(1);
}
else rating_helper = "notfound";

但这不起作用,我总是得到&#34;没有发现&#34;。我做错了什么?

0 个答案:

没有答案