按XML标记拆分文件

时间:2015-05-29 02:15:12

标签: java xml split tags

我是Java编码的业余爱好者,我陷入了一项任务。除了基本部分之外,我已经编写了大部分代码,而且我在如何去做这个问题上空白了。我希望有人可以指出我是如何完成的。

我创建了一个名为Splitter的课程。它的工作是读取XML文件并根据特定的XML startend标记将其拆分为较小的文件,而每个较小的文件也必须小于给定的maxfilesize

此外,必须将旧版本的文件放入带有时间戳的存档文件夹中。我大部分都得到了它。除了我不确定如何通过startend标记进行拆分。我有一个getXML方法,可以读取这些标记之间的所有内容;但是从那里,当我把它称为拆分方法时,我不确定该如何处理它。

任何人都有任何他们可以分享的意见,以引导我朝正确的方向发展?

public class Splitter {

  public static void split(String directory, String fileName, 
        String transactionTag, int fileSize) throws IOException{
    String startTag = "<"+ transactionTag + ">";
    String endTag = "</"+ transactionTag + ">";
    File f = new File(directory + fileName);
    File output = new File (directory + "Output/" + fileName);
    BufferedInputStream in = new BufferedInputStream(new FileInputStream(f));
    Splitter sp = new Splitter();
    int fileCount = 0;
    int len;
    int maxFileSize = fileSize;
    byte[] buf = new byte[maxFileSize]; 
    SimpleDateFormat sdf = new SimpleDateFormat("yyyy_MM_dd_hh_mm_ss");
    Date curDate = new Date();
    String strDate = sdf.format(curDate);
    String fileTime = strDate;
    while ((len = in.read(buf)) > 0) {
        fileCount++;
        try{
            File afile =new File(directory + "Output\\" + fileName + "." + fileCount);
            if(afile.exists()){
                if(afile.renameTo(new File(directory + "Output\\Archive\\" + fileName + "." + fileCount + "-" + fileTime))){
                }else{
                    System.out.println("Files failed to be archived. ");
                }
            }else{
                System.out.println("This file does not exist.");
            }
        }catch(Exception e){
            e.printStackTrace();
        }
        BufferedOutputStream out = new BufferedOutputStream(new FileOutputStream(output + "." + fileCount));
        String newInput = new String(buf,0,len); // newInput is a String no greater in length than whatever bytes or chars
        String value = sp.getXML(newInput, transactionTag);

        //This part is incomplete.
        //Do something with value to make this class split the file by XML tags.
        //Also make sure any left over code before the first start tag and last end tag are also put into smaller files.

        int start = value.indexOf(startTag);
        int end = value.lastIndexOf(endTag);

        out.write(buf,0,len);
        out.close();
    }
    in.close();
  }
  public String getXML(String content, String tagName){
    String startTag = "<"+ tagName + ">";
    String endTag = "</"+ tagName + ">";
    int startposition = content.indexOf(startTag);
    int endposition = content.indexOf(endTag, startposition);
    if (startposition == -1)return "";
    startposition += startTag.length(); 
    if(endposition == -1) return "";
    return content.substring(startposition, endposition);
  }
  public static void main(String[]args) throws IOException{
    int num = 100;
    int kb = num * 1024;
    Splitter split = new Splitter();
    split("C:/SplitUp/", "fileSplit.xml", "blah1", kb);
    System.out.println("Program ran");
  }
}

1 个答案:

答案 0 :(得分:0)

根据您的评论,我假设您的fileSplit.xml看起来像这样:

<header>
  <!-- Some XML metadata -->
<header>
<start>
  <!-- Some XML data -->
</start>
<start>
  <!-- Some XML data -->
</start>
<start>
  <!-- Some XML data -->
</start>
<start>
  <!-- Some XML data -->
</start>
<footer>
  <!-- Some XML metadata -->
<footer>

每个<start><header><footer>及其相应的结束标记都在各自的行中。

您可以使用以下方法简化代码:

  1. java.nio.files.readAllLines(Path path, Charset cs)阅读您的C:/SplitUp/fileSplit.xml
  2. java.io.FileWriter写入您的所有子文件。
  3. 基本上(对于Java 7+),您可以执行类似的操作,

    // read the entire fileSplit.xml into an array of string
    List<String> fileContent = files.readAllLines(Paths.get("C:/SplitUp/fileSplit.xml"), StandardCharsets.UTF_8);
    
    // iterate through the array to split the file content into sub-files
    String subFileContent = "";
    for(String line : fileContent){
      if(line.compareToIgnoreCase("<start>") != 0 || line.compareToIgnoreCase("<footer>") != 0) { // keep reading if this line isn't a <start> nor a <footer>
        subFileContent += line;
      }
      else { // if this line is a <start> or a <footer>, write all the content thus-far into a new sub-file
        // sub-files names taken from your codes above. Make sure they are unique!
        FileWriter fileWriter = new FileWriter(directory + "Output\\" + fileName + "." + fileCount++);
    
        // this will write up to only maxFileSize number of characters.
        // how do you want to handle spillover?
        fileWriter.write(subFileContent, 0, maxFileSize);
    
        // reset subFileContent
        subFileContent = new String(line);
      }
    }
    

    满足

    的要求
      

    ...每个较小的文件也必须小于给定的maxfilesize

    您可以将上一个else更改为else if,以便在subFileContent超过length()时强制写出maxFileSize,并确保剩余部分被写入第二个子文件。但我要说的是,在处理第二个要求之前,先将内容拆分为子文件,然后再处理第二个要求。