我想从此HTML文档中获取 pricecell / WebRupee 类的值。
文档摘录如下所示。
<tr prodid="143012" class="tablerow style2">
<td class="pricecell"><span class="WebRupee">Rs.</span> 29 <br><font style="font-size:smaller;font-weight:normal"> 3 days </font></td>
<td class="spacer"></td>
<td class="detailcell"><span><span class="label label-default" style="background-color:#3cb521;color:#fff;border:1px solid #3cb521">FULL TT</span> </span><span><span class="label label-default" style="background-color:#fff;color:#0c7abc;border:1px solid #0c7abc">SMS</span> </span>
<div style="padding-top:5px">
29 Full Talktime
</div>
<div class="detailtext">
5 Local A2A SMS valid for 1 day
</div></td>
</tr>
<tr prodid="127535" class="tablerow style2">
<td class="pricecell"><span class="WebRupee">Rs.</span> 59 <br><font style="font-size:smaller;font-weight:normal"> 7 days </font></td>
<td class="spacer"></td>
<td class="detailcell"><span><span class="label label-default" style="background-color:#3cb521;color:#fff;border:1px solid #3cb521">FULL TT</span> </span><span><span class="label label-default" style="background-color:#fff;color:#0c7abc;border:1px solid #0c7abc">SMS</span> </span>
<div style="padding-top:5px">
59 Full Talktime
</div>
<div class="detailtext">
10 A2A SMS valid for 2 days
</div></td>
</tr>
<tr prodid="143025" class="tablerow style2">
<td class="pricecell"><span class="WebRupee">Rs.</span> 99 <br><font style="font-size:smaller;font-weight:normal"> 12 days </font></td>
<td class="spacer"></td>
<td class="detailcell"><span><span class="label label-default" style="background-color:#3cb521;color:#fff;border:1px solid #3cb521">FULL TT</span> </span>
<div style="padding-top:5px">
99 Full Talktime
</div>
<div class="detailtext">
10 Local A2A SMS for 2 days only
</div></td>
</tr>
我特别想要包含在pricecell-&gt; webrupee类中的值29,59,99,我需要它由jsoup解析。
我尝试的代码: -
class kp extends AsyncTask<Void,Void,Void> {
ArrayList<HashMap<String, String>> arraylist2 = new ArrayList<>();
@Override
protected void onPreExecute() {
super.onPreExecute();
}
@Override
protected Void doInBackground(Void... voids) {
try {
Document doc = Jsoup.connect("http://www.ireff.in/plans/" + operator+"/" + state).userAgent("Mozilla/5.0 " +
"(Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.101 Safari/537.36").get();
int count = 0, j = 0, i = 0;
String TopupTable="";
for (Element table : doc.select("div[id=Topup]")) {
for (Element row : table.select("tr")) {
count++;
TopupTable=TopupTable+row.toString();//has all the values of topup category
System.out.print(TopupTable+"TopupTable row string here");
}
}
....
....
....
Elements r2;
String temp;
Document doc2 = Jsoup.parse(TopupTable, "",Parser.xmlParser());//doc2 has the TopupTable string converted to a "Document" type variable
for (Element table : doc.select("div[id=Topup]")) {
for (Element row : table.select("tr")) {
i++;
j++;
k++;
try {
Elements tds = row.select("td:not([rowspan])");
if(tds.contains("tr[id=download]"))
continue;
Elements tds2 = doc2.getElementsByClass("td[class=pricecell]");
temp=doc2.getElementsByClass("span[class=WebRupee]").toString();//trying to get those numeric values and store it in temp variable
System.out.print(temp+"temp var");
我得到临时变量的空白值,请告诉我哪里出错了。
谢谢您的时间:-) 如果您对此问题有更多详细信息,请在下面发表评论。
答案 0 :(得分:1)
我试过这样对我有用:
public class Test {
public static void main(String[] args) {
String parseText = "<table><tr prodid=\"143012\" class=\"tablerow style2\">\n" +
" <td class=\"pricecell\"><span class=\"WebRupee\">Rs.</span> 29 <br><font style=\"font-size:smaller;font-weight:normal\"> 3 days </font></td>\n" +
" <td class=\"spacer\"></td>\n" +
" <td class=\"detailcell\"><span><span class=\"label label-default\" style=\"background-color:#3cb521;color:#fff;border:1px solid #3cb521\">FULL TT</span> </span><span><span class=\"label label-default\" style=\"background-color:#fff;color:#0c7abc;border:1px solid #0c7abc\">SMS</span> </span>\n" +
" <div style=\"padding-top:5px\">\n" +
" 29 Full Talktime \n" +
" </div>\n" +
" <div class=\"detailtext\">\n" +
" 5 Local A2A SMS valid for 1 day \n" +
" </div></td>\n" +
" </tr>\n" +
" <tr prodid=\"127535\" class=\"tablerow style2\">\n" +
" <td class=\"pricecell\"><span class=\"WebRupee\">Rs.</span> 59 <br><font style=\"font-size:smaller;font-weight:normal\"> 7 days </font></td>\n" +
" <td class=\"spacer\"></td>\n" +
" <td class=\"detailcell\"><span><span class=\"label label-default\" style=\"background-color:#3cb521;color:#fff;border:1px solid #3cb521\">FULL TT</span> </span><span><span class=\"label label-default\" style=\"background-color:#fff;color:#0c7abc;border:1px solid #0c7abc\">SMS</span> </span>\n" +
" <div style=\"padding-top:5px\">\n" +
" 59 Full Talktime \n" +
" </div>\n" +
" <div class=\"detailtext\">\n" +
" 10 A2A SMS valid for 2 days \n" +
" </div></td>\n" +
" </tr>\n" +
" <tr prodid=\"143025\" class=\"tablerow style2\">\n" +
" <td class=\"pricecell\"><span class=\"WebRupee\">Rs.</span> 99 <br><font style=\"font-size:smaller;font-weight:normal\"> 12 days </font></td>\n" +
" <td class=\"spacer\"></td>\n" +
" <td class=\"detailcell\"><span><span class=\"label label-default\" style=\"background-color:#3cb521;color:#fff;border:1px solid #3cb521\">FULL TT</span> </span>\n" +
" <div style=\"padding-top:5px\">\n" +
" 99 Full Talktime \n" +
" </div>\n" +
" <div class=\"detailtext\">\n" +
" 10 Local A2A SMS for 2 days only \n" +
" </div></td>\n" +
" </tr></table>";
Document doc = Jsoup.parse(parseText);
doc.select("font").remove();
doc.select("span").remove();
for (Element row : doc.select("tr")) {
Elements tds = row.select("td.pricecell");
Whitelist wl = Whitelist.basic();
String value = Jsoup.clean(tds.get(0).text(), wl);
System.out.println(value);
}
}
}
输出:
29
59
99
答案 1 :(得分:1)
只需使用selectors和org.jsoup.nodes.Element.ownText()即可提取没有子文字的单元格文本。
仅获取此元素拥有的文本;不合并 所有孩子的文字。
Document doc = Jsoup
.connect(url)
.userAgent(userAgent)
.get();
Elements cells = doc.select("td.pricecell");
ListIterator<Element> itr = cells.listIterator();
while (itr.hasNext()) {
Element cell = itr.next();
System.out.println(cell.ownText());
}
<强>输出强>
29
59
99
答案 2 :(得分:0)
您可以使用Node.childNodes
检索List
Node
个Document doc = Jsoup.parse(html);
Elements trs = doc.select("table tr");
for (Element tr : trs) {
Element priceCell = tr.select(".pricecell").first();
for (Node child : priceCell.childNodes()) {
if (child instanceof TextNode) {
System.out.println(((TextNode) child).text().trim());
}
}
}
个对象并检查每个对象的实例(在您的情况下为TextNode
):
T