我想一次选择多个html文件,并使用html解析器仅提取文本,每个html文件将创建一个单独的文本文件。任何人都可以为此建议java代码。
`FileReader f0 = new FileReader("j.html");
StringBuilder sb = new StringBuilder();
BufferedReader br = new BufferedReader(f0);
while((temp1=br.readLine())!=null)
{ sb.append(temp1); }
String para = sb.toString().replaceAll("<br>","\n");
String textonly = Jsoup.parse(para).text();
System.out.println(textonly);
FileWriter f1=new FileWriter("j.txt");
char buf1[] = new char[textonly.length()];
textonly.getChars(0,textonly.length(),buf1,0);
for(i=0;i<buf1.length;i++) {
if(buf1[i]=='\n')
f1.write("\r\n");
f1.write(buf1[i]);
}`
我有这段代码,但一次只能使用一个文件。我想选择多个文件。
答案 0 :(得分:0)
你不能把你的代码放在循环中吗?像(未经测试)的东西:
// loop over files you want to change
for (int i = 1; i < 1000; i++) {
FileReader f0 = new FileReader(i + ".html");
StringBuilder sb = new StringBuilder();
BufferedReader br = new BufferedReader(f0);
while((temp1=br.readLine())!=null) {
sb.append(temp1);
}
String para = sb.toString().replaceAll("<br>","\n");
String textonly = Jsoup.parse(para).text();
System.out.println(textonly);
// stick .txt on the end of the filename to write out
FileWriter f1=new FileWriter(i + ".txt");
char buf1[] = new char[textonly.length()];
textonly.getChars(0,textonly.length(),buf1,0);
for(i=0;i<buf1.length;i++) {
if(buf1[i]=='\n') {
f1.write("\r\n");
}
f1.write(buf1[i]);
}