我需要使用网站http://www.informatik.uni-trier.de/~ley/pers/hd/k/Kumar:G=_Praveen.html中的jsoup库解析HTML中的两个表 .. 由于页面上有两个表,我不知道如何解析表内容。我需要提取第一个表的内容,即只有作者姓名和他们的出版物以及最后命名的第二个表。合着者... 我尝试编码(下面给出的代码),但它给出了错误......
public class Main {
public static void main(String[] args) {
try {
Document doc =Jsoup.connect(“http://www.informatik.unitrier.de/~ley/pers/hd/k/Kumar:G=_Praveen.html“).get();
Elements trs = doc.select(“table tr”);
Element table = doc.select(“table[class=coauthor]“).first();
Iterator ite = table.select(“td”).iterator();
ite.next();
System.out.println(“Value 1: ” + ite.next().text());
System.out.println(“Value 2: ” + ite.next().text());
System.out.println(“Value 3: ” + ite.next().text());
System.out.println(“Value 4: ” + ite.next().text());
trs.remove(0);
for (Element tr : trs) {
Elements tds = tr.getElementsByTag(“td”);
Element td = tds.first();
System.out.println(“Blog: ” + td.text());
}
} catch (IOException e) {
e.printStackTrace();
}
}
}
请告诉我在上面的代码中我需要做些什么更改,以便从我需要的表格中获取确切的信息。任何帮助都将被赞赏..提前付款..
答案 0 :(得分:2)
final String url = "http://www.informatik.uni-trier.de/~ley/pers/hd/k/Kumar:G=_Praveen.html";
Document doc = Jsoup.connect(url).get();
for( Element element : doc.select("table div.data") )
{
// System.out.println(element); // Use this line if you need the HTML Element instead of the text
System.out.println(element.text());
}
<强>输出:强>
G. Praveen Kumar, Anirban Sarkar: Weighted Association Rule Mining and Clustering in Non-binary Search Space. ITNG 2010: 238-243
G. Praveen Kumar, Arjun Kumar Murmu, Biswas Parajuli, Prasenjit Choudhury: MULET: A Multilanguage Encryption Technique. ITNG 2010: 779-782
G. Praveen Kumar, Anirban Sarkar, Narayan C. Debnath: A New Algorithm for Frequent Itemset Generation in Non-Binary Search Space. ITNG 2009: 149-153
for( Element element : doc.select("table td.coauthor") )
{
System.out.println(element.text());
}
<强>输出:强>
Prasenjit Choudhury
Narayan C. Debnath
Arjun Kumar Murmu
Biswas Parajuli
Anirban Sarkar