我需要一些帮助,从链接(http://www.informatik.uni-trier.de/~ley/pers/hd/k/Kumar:G=_Praveen.htm)中提取表格中的子字符串..
我只需要提取作者的名字并将其存储到2D数组中。 例如:
a[0][0]= G. Praveen kumar a[0][1]= Anirban Sakar. a[1][0]= G. Praveen Kumar, a[1][1]= Arjun Kumar Murmu, a[1][2]= Biswas Parajuli , a[1][3]= Prasenjit Choudhury
等下一行(直到表的末尾)...... 我试过的代码如下:
我需要提取作者的名字(substring)并存储在2D数组中,因为名称用逗号分隔,然后是:文章的名称。 我不希望文章的名称存储在2D数组中,而只是存储在表格末尾的人名。
任何帮助将不胜感激。提前谢谢。
package codetrial;
import java.io.*;
import java.lang.String.*;
import org.jsoup.*;
import org.jsoup.nodes.*;
import java.io.BufferedWriter.*;
import java.io.FileWriter.*;
import java.io.IOException.*;
import java.util.*;
import org.apache.commons.lang.StringUtils;
public class Main {
public static void main(String[] args) {
try{
String a;
final String url="http://www.informatik.unitrier.de/~ley/pers/hd/k/Kumar:G=_Praveen.html";
Document doc = Jsoup.connect(url).get();
for(Element element : doc.select("table div.data") ) {
a = element.text();
String[] names = a.split(", "); // comma and space
String name_one = StringUtils.substringBetween(url, " ", ",");
String name_two = StringUtils.substringBetween(url, ",", ":");
System.out.println("person1 = " + name_one);
System.out.println("person2 = " +name_two);
for(String name : names) {
System.out.println(name);
}
}
} catch (IOException e) {
e.printStackTrace();
}
}
}
答案 0 :(得分:2)
您可以使用Jsoup库来执行此操作。看我的例子:
import java.util.ArrayList;
import java.util.List;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
public class SourceCodeProgram {
public static void main(String[] args) throws Exception {
System.out.println(PageParser.readAuthors("http://www.informatik.uni-trier.de/~ley/pers/hd/k/Kumar:G=_Praveen.htm"));
}
}
class PageParser {
public static List<List<String>> readAuthors(String url) throws Exception {
Document document = Jsoup.connect(url).get();
Elements elements = document.getElementsByClass("data");
List<List<String>> result = new ArrayList<List<String>>();
List<String> authors = new ArrayList<String>();
for (Element element : elements) {
for (Element child : element.children()) {
if ("title".equals(child.className())) {
result.add(authors);
authors = new ArrayList<String>();
break;
}
authors.add(child.html());
}
}
return result;
}
}
输出:
[[G. Praveen Kumar, Anirban Sarkar], [G. Praveen Kumar, Arjun Kumar Murmu, Biswas Parajuli, Prasenjit Choudhury], [G. Praveen Kumar, Anirban Sarkar, Narayan C. Debnath]]
答案 1 :(得分:0)
在for循环中使用以下代码
String htmlString = element.text();
a = htmlString.replaceAll("\\<.*?>","");
String names = a.split(":")[0].split(",");
for(String name : names) {
System.out.println(name);
}