我是Android开发的新手,在关机时间学习它。我正在创建一个“Web Feed Analyzer”,就像应用程序一样,我从一个网站获取一个feed并在Web视图中显示它包含如下:
WebView w = new WebView(this);
w.loadData(strData,"text/html","utf-8");
工作正常。 但我想做的是在调用loadData之前修改存储在strData变量中的HTML数据并从中删除一些元素。
我正在考虑如何解析HTML,但我不知道如何正确地做到这一点,我更喜欢简单的方法。
任何帮助都将不胜感激。
答案 0 :(得分:0)
在意识到Jsoup超出了我的能力之后,我采用了最糟糕和最低级的方法来解决我自己的问题。问题是PARSE html字符串和从中剥离元素。好吧,我实际上无法解析任何东西,但能够通过良好的旧字符串处理从html中提取特定部分。
以下是Sample html:
<a href="http://yts.re/movie/Night_on_Earth_1991"><img style="float:left" src="http://static.yts.re/attachments/Night_on_Earth_1991/poster_med.jpg" alt="Night on Earth (1991) - YIFY-Torrents" /></a><br /> <a href="http://www.imdb.com/title/tt0102536/">http://www.imdb.com/title/tt0102536/</a><br /> IMDB Rating: 7.8/10<br /> Genre: Comedy | Drama<br /> Quality: 720p<br /> Size: 876.69 MB<br /> Run Time: 2hr 7 min<br /> <p>A collection of five stories involving cab drivers in five different cities. Los Angeles - A talent agent for the movies discovers her cab driver would be perfect to cast, but the cabbie is reluctant to give up her solid cab driver's career. New York - An immigrant cab driver is continually lost in a city and culture he doesn't understand. Paris - A blind girl takes a ride with a cab driver from the Ivory Coast and they talk about life and blindness. Rome - A gregarious cabbie picks up an ailing man and virtually talks him to death. Helsinki - an industrial worker gets laid off and he and his compatriots discuss the bleakness and unfairness of love and life and death.</p>
To&#34; Parse&#34;这个字符串我使用了以下代码:
private void breakdown(String html) {
//Extracting YTSLink
int ls = html.indexOf("href=") + 6;
int le = html.indexOf(">") - 1;
String ytsLink = html.substring(ls,le);
//Extraction ImagePath
int is = html.indexOf("src=") + 5;
int ie = html.indexOf("alt=") - 2;
String imgPath = html.substring(is,ie);
//Extracting IMDBLink
ls = html.lastIndexOf("href=") + 6;
le = html.indexOf(">",ls) - 1;
String imdbLink = html.substring(ls,le);
//Extracting PlotSummary
int ps = html.indexOf("<p>") + 3;
int pe = html.indexOf("</p>");
String summary = html.substring(ps,pe);
//Extracting IMDBRatings and Other Attributes
int s = html.indexOf("IMDB Rating:");
int l = html.indexOf("<br />",s);
String imdbRating = html.substring(s,l);
s = html.indexOf("Genre:");
l = html.indexOf("<br />",s);
String genre = html.substring(s,l);
s = html.indexOf("Quality:");
l = html.indexOf("<br />",s);
String quality = html.substring(s,l);
s = html.indexOf("Size:");
l = html.indexOf("<br />",s);
String fileSize = html.substring(s,l);
s = html.indexOf("Run Time:");
l = html.indexOf("<br />",s);
String runTime = html.substring(s,l);
}
html以预先确定的方式格式化。我所做的就是通过计算它们的位置来逐个提取信息。显然这是一个草率的解决方案。我仍在寻找更好的方法。