使用JSoup将Java中的HTML文件合并

时间:2014-08-08 02:52:26

标签: java html for-loop jsoup bufferedwriter

我正在尝试使用Jsoup将多个.html文件合并到一个.html文件中。我的想法是获取.html中的dir文件列表,并将名称存储在ArrayList中。然后我会loop通过ArrayList,将每个文件名作为字符串传递给Jsoup.parse()方法。

我能够毫无问题地填充ArrayList并且我的代码一次只能处理一个文件但是当我添加到下面的for loops时,会创建NEW_INFORMATION.html文件但是什么都没有。关于我缺少的任何想法?

以下是当前代码:

public class mergeFiles {

    public static void main(String[] args) throws IOException {

        File outputFile = new File ("C:\\Users\\1234\\Desktop\\PowerShellOutput\\NEW_INFORMATION.html");
        File dir = new File ("C:\\Users\\1234\\Desktop\\PowerShellOutput\\");
        File [] paths;
        //Only capture files with extension .html
        FilenameFilter fileNameFilter = new FilenameFilter(){
            public boolean accept(File dir, String name) {
                // TODO Auto-generated method stub
                if (name.lastIndexOf('.') > 0) {
                    int lastIndex = name.lastIndexOf('.');
                    String extension = name.substring(lastIndex);
                    if(extension.equals(".html")){
                        return true;
                    }
                }
                return false;
            }
        };      
        paths = dir.listFiles(fileNameFilter);
        List<String> list = new ArrayList<String>();
        for (File x : paths){
            list.add(x.toString());
        }
        System.out.print(list);
        for (String s : list){
            File input = new File(s);
            Document doc = Jsoup.parse(input, "UTF-8"); 
            Elements links = doc.select("table");
            @SuppressWarnings("resource")
            BufferedWriter bw = new BufferedWriter(new OutputStreamWriter(new       FileOutputStream(outputFile), "UTF-8"));
            bw.append("<h2>" + s.toString() + "<h2>");
            bw.append(links.toString());
        }
    }
}

我也尝试过这种变体而不将路径转换为字符串(结果相同):

for (File x : paths){
        Document doc = Jsoup.parse(x, "UTF-8"); 
        Elements links = doc.select("table");
        @SuppressWarnings("resource")
        BufferedWriter bw = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(outputFile), "UTF-8"));
        bw.append("<h2>" + x.toString() + "<h2>");
        bw.append(links.toString());
    }

将来可能想要这样的人的完整答案:

package htmlMerge;

import java.io.*;
import org.jsoup.*;
import org.jsoup.nodes.*;
import org.jsoup.select.Elements;

public class mergeFiles {

public static void main(String[] args) throws IOException {

    try {
        String outFileName = System.getProperty("user.home") + "/Desktop/<Insert The Directory/name.html>";
        File outputFile = new File(outFileName);
        String desktopDir = System.getProperty("user.home") + "/Desktop/<Insert Dir name>";
        File dir = new File(desktopDir);
        File[] paths;
        //create a file filter that will only worry about .html files if your folder contains other extensions
        FilenameFilter fileNameFilter = new FilenameFilter() {
            public boolean accept(File dir, String name) {
                if (name.lastIndexOf('.') > 0) {
                    int lastIndex = name.lastIndexOf('.');
                    String extension = name.substring(lastIndex);
                    if (extension.equals(".html")) {
                        return true;
                    }
                }
                return false;
            }
        };
        paths = dir.listFiles(fileNameFilter);
        //use BufferedWriterd to create the initial .html file with a header
        BufferedWriter bw = new BufferedWriter(new OutputStreamWriter(
                new FileOutputStream(outputFile), "UTF-8"));
        bw.write("<h1>REPORT DATA</h1>");
        bw.close();
        /*Use file writer to append the .html file with additional .html files
        In this case, the .html files all contain One 'table', so this
        will append the tables to 'outputFile'.*/
        try {
            String file = outputFile.getAbsolutePath();
            FileWriter fw = new FileWriter(file, true);
            for (File x : paths) {
                Document doc = Jsoup.parse(x, "UTF-8");
                Elements links = doc.select("table");
                //adds the filename of the .html as a Level 2 heading
                fw.write("<h2>" + x.toString() + "</h2>");
                fw.write(links.toString());
            }
            fw.close();
        }catch (IOException ioe) {
            System.err.println(ioe.getMessage());
        } finally {
            bw.close();
        }
    } catch (IOException ioe) {
        System.out.println(ioe.getMessage());
    }
    System.out.println("\nMerge Completed Successfully");
  }
}

1 个答案:

答案 0 :(得分:2)

您必须关闭BufferedWriter才能看到更改。