根据标记名称将大XML文件拆分为小文件

时间:2013-08-28 14:15:41

标签: java xml-parsing sax

我有一个要求,我将获得一个xml文件&标签名称作为输入,我必须使用java使用给定的标签名称拆分xml文件。 PLS。建议我

INPUT: XML文件

  <note>
    <to>Tove</to>
    <from>Jani</from>
    <heading>Reminder</heading>
    <body>Don't forget me this weekend!</body>
   </note>

  <book>
  <author>Gambardella, Matthew</author>
  <title>XML Developer's Guide</title>
  <genre>Computer</genre>
  <price>44.95</price>
  <publish_date>2000-10-01</publish_date>
  <description>An in-depth look at creating applications 
  with XML.</description>
  </book>
 <book>
  <author>Ralls, Kim</author>
  <title>Midnight Rain</title>
  <genre>Fantasy</genre>
  <price>5.95</price>
  <publish_date>2000-12-16</publish_date>
  <description>A former architect battles corporate zombies, 
  an evil sorceress, and her own childhood to become queen 
  of the world.</description>

TAG NAME:book

输出:

<book>
  <author>Gambardella, Matthew</author>
  <title>XML Developer's Guide</title>
  <genre>Computer</genre>
  <price>44.95</price>
  <publish_date>2000-10-01</publish_date>
  <description>An in-depth look at creating applications 
  with XML.</description>
  </book>
 <book>
  <author>Ralls, Kim</author>
  <title>Midnight Rain</title>
  <genre>Fantasy</genre>
  <price>5.95</price>`enter code here`
  <publish_date>2000-12-16</publish_date>
  <description>A former architect battles corporate zombies, 
  an evil sorceress, and her own childhood to become queen 
  of the world.</description>
 </book>

2 个答案:

答案 0 :(得分:0)

我认为通用算法如下:

  • 将文件读入缓冲区
  • 找到您的第一个实例 标签
  • 继续阅读行,直至找到最后一个标记
  • 输出 那些行

答案 1 :(得分:0)

这可以通过JSOUP

轻松完成

Jsoup

这是完整的工作example

import java.io.File;
import java.io.IOException;

import org.apache.commons.io.FileUtils;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.select.Elements;

public class Test {

    public static void main(String args[]) throws IOException {
        String path = Test.class.getResource("/test.txt").getPath();
        String string = FileUtils.readFileToString(new File(path));

        Document doc = Jsoup.parse(string);
        Elements elementsByTag = doc.getElementsByTag("book");
        System.out.println(elementsByTag);
    }

}

<强>的test.txt

 <note>
    <to>Tove</to>
    <from>Jani</from>
    <heading>Reminder</heading>
    <body>Don't forget me this weekend!</body>
   </note>

  <book>
  <author>Gambardella, Matthew</author>
  <title>XML Developer's Guide</title>
  <genre>Computer</genre>
  <price>44.95</price>
  <publish_date>2000-10-01</publish_date>
  <description>An in-depth look at creating applications 
  with XML.</description>
  </book>
 <book>
  <author>Ralls, Kim</author>
  <title>Midnight Rain</title>
  <genre>Fantasy</genre>
  <price>5.95</price>
  <publish_date>2000-12-16</publish_date>
  <description>A former architect battles corporate zombies, 
  an evil sorceress, and her own childhood to become queen 
  of the world.</description>
  </book>

<强>输出

<book> 
 <author>
  Gambardella, Matthew
 </author> 
 <title>XML Developer's Guide</title> 
 <genre>
  Computer
 </genre> 
 <price>
  44.95
 </price> 
 <publish_date>
  2000-10-01
 </publish_date> 
 <description>
  An in-depth look at creating applications with XML.
 </description> 
</book>
<book> 
 <author>
  Ralls, Kim
 </author> 
 <title>Midnight Rain</title> 
 <genre>
  Fantasy
 </genre> 
 <price>
  5.95
 </price> 
 <publish_date>
  2000-12-16
 </publish_date> 
 <description>
  A former architect battles corporate zombies, an evil sorceress, and her own childhood to become queen of the world.
 </description> 
</book>