从Android中的多个段落标记解析文本

时间:2013-10-17 07:22:58

标签: java android android-layout android-intent html-parsing

我有多个段落标记,如下所示,具有相同的类属性名称 js-tweet-text tweet-text 需要解析Android中的文本

    Caged parrot sings for its master. Industrialists & IAS officers named in the charge sheet.
Sometext................

Html Text:

<p class="js-tweet-text tweet-text">Caged parrot sings for its master. Industrialists &amp; IAS officers named in the charge sheet. <a href="/PMOIndia" class="twitter-atreply pretty-link" dir="ltr" ><s>@</s><b>PMOIndia</b></a> &amp; MOS Coal left scot free.</p>


<p class="js-tweet-text tweet-text">Sometext................ <a href="/PMOInd" class="twitter-atreply pretty-link" dir="ltr" ><s>@</s><b>PMOIndia</b></a> &amp; MOS Coal left sc free.</p>

等...

有人可以帮忙吗?

2 个答案:

答案 0 :(得分:1)

也许这可以通过正则表达式来完成,但由于我不知道标签内部会发生什么,这样做,

    String input = "<p class=\"js-tweet-text tweet-text\">Caged parrot sings for its master. Industrialists &amp; IAS officers named in the charge sheet. <a href=\"/PMOIndia\" class=\"twitter-atreply pretty-link\" dir=\"ltr\" ><s>@</s><b>PMOIndia</b></a> &amp; MOS Coal left scot free.</p>";
    int i=0;
    boolean flag=true;
    String result="";
    for(i=0;i<input.length();i++)
    {
        char c = input.toCharArray()[i];
        if(c=='<') flag = false;
        else if(c=='>')
        {
            flag = true;
            continue;
        }
        if(flag) result += c;       
    }
    System.out.println(result);

输出

Caged parrot sings for its master. Industrialists &amp; IAS officers named in the charge sheet. @PMOIndia &amp; MOS Coal left scot free.

答案 1 :(得分:1)

我在Android

中使用了Jsoup Parser来满足此要求
 Docuument doc = Jsoup.connect("https://twitter.com/someperson/")
                           .userAgent("Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.110 Safari/537.36")
                           .get();

Elements elements = doc.select("p[class=js-tweet-text tweet-text]");  

for (int j=0;j<elements.size();j++) {

                Element tmp = elements.get(j);
                String value = tmp.text();
        }

上面的代码将返回段落标记中所有类属性文本(与“js-tweet-text tweet-text”匹配)值