Question

我是Android开发的新手，在关机时间学习它。我正在创建一个“Web Feed Analyzer”，就像应用程序一样，我从一个网站获取一个feed并在Web视图中显示它包含如下： WebView w = new WebView(this); w.loadData(strData,"text/html","utf-8");

工作正常。但我想做的是在调用loadData之前修改存储在strData变量中的HTML数据并从中删除一些元素。

我正在考虑如何解析HTML，但我不知道如何正确地做到这一点，我更喜欢简单的方法。

任何帮助都将不胜感激。

Answer 1

在意识到Jsoup超出了我的能力之后，我采用了最糟糕和最低级的方法来解决我自己的问题。问题是PARSE html字符串和从中剥离元素。好吧，我实际上无法解析任何东西，但能够通过良好的旧字符串处理从html中提取特定部分。

以下是Sample html：

<a href="http://yts.re/movie/Night_on_Earth_1991"><img style="float:left" src="http://static.yts.re/attachments/Night_on_Earth_1991/poster_med.jpg" alt="Night on Earth (1991) - YIFY-Torrents" /></a><br /> <a href="http://www.imdb.com/title/tt0102536/">http://www.imdb.com/title/tt0102536/</a><br /> IMDB Rating: 7.8/10<br /> Genre: Comedy | Drama<br /> Quality: 720p<br /> Size: 876.69 MB<br /> Run Time: 2hr 7 min<br /> <p>A collection of five stories involving cab drivers in five different cities. Los Angeles - A talent agent for the movies discovers her cab driver would be perfect to cast, but the cabbie is reluctant to give up her solid cab driver's career. New York - An immigrant cab driver is continually lost in a city and culture he doesn't understand. Paris - A blind girl takes a ride with a cab driver from the Ivory Coast and they talk about life and blindness. Rome - A gregarious cabbie picks up an ailing man and virtually talks him to death. Helsinki - an industrial worker gets laid off and he and his compatriots discuss the bleakness and unfairness of love and life and death.</p>

To＆＃34; Parse＆＃34;这个字符串我使用了以下代码：

private void breakdown(String html) {
    //Extracting YTSLink
    int ls = html.indexOf("href=") + 6;
    int le = html.indexOf(">") - 1;
    String ytsLink = html.substring(ls,le);

    //Extraction ImagePath
    int is = html.indexOf("src=") + 5;
    int ie = html.indexOf("alt=") - 2;
    String imgPath = html.substring(is,ie);

    //Extracting IMDBLink
    ls = html.lastIndexOf("href=") + 6;
    le = html.indexOf(">",ls) - 1;
    String imdbLink = html.substring(ls,le);

    //Extracting PlotSummary
    int ps = html.indexOf("<p>") + 3;
    int pe = html.indexOf("</p>");
    String summary = html.substring(ps,pe);

    //Extracting IMDBRatings and Other Attributes
    int s = html.indexOf("IMDB Rating:");
    int l = html.indexOf("<br />",s);
    String imdbRating = html.substring(s,l);

    s = html.indexOf("Genre:");
    l = html.indexOf("<br />",s);
    String genre = html.substring(s,l);

    s = html.indexOf("Quality:");
    l = html.indexOf("<br />",s);
    String quality = html.substring(s,l);

    s = html.indexOf("Size:");
    l = html.indexOf("<br />",s);
    String fileSize = html.substring(s,l);

    s = html.indexOf("Run Time:");
    l = html.indexOf("<br />",s);
    String runTime = html.substring(s,l);
}

html以预先确定的方式格式化。我所做的就是通过计算它们的位置来逐个提取信息。显然这是一个草率的解决方案。我仍在寻找更好的方法。

在Android中修改运行时HTML

1 个答案: