我们有一个场景,我们需要在小块中拆分大小超过10GB的大型xml文件。每个块应包含100或200个元素。示例xml
<Employees>
<Employee id="1">
<age>29</age>
<name>Pankaj</name>
<gender>Male</gender>
<role>Java Developer</role>
</Employee>
<Employee id="3">
<age>35</age>
<name>Lisa</name>
<gender>Female</gender>
<role>CEO</role>
</Employee>
<Employee id="3">
<age>40</age>
<name>Tom</name>
<gender>Male</gender>
<role>Manager</role>
</Employee>
<Employee id="3">
<age>25</age>
<name>Meghna</name>
<gender>Female</gender>
<role>Manager</role>
</Employee>
<Employee id="3">
<age>29</age>
<name>Pankaj</name>
<gender>Male</gender>
<role>Java Developer</role>
</Employee>
<Employee id="3">
<age>35</age>
<name>Lisa</name>
<gender>Female</gender>
<role>CEO</role>
</Employee>
<Employee id="3">
<age>40</age>
<name>Tom</name>
<gender>Male</gender>
<role>Manager</role>
</Employee>
</Employees>
我有Stax解析器代码,它会将文件分成小块。但是每个文件只包含一个完整的Employee元素,我需要在单个文件中包含100或200个或更多<Employee>
个元素。这是我的java代码
public static void main(String[] s) throws Exception{
String prefix = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n"+"\n";
String suffix = "\n</Employees>\n";
int count=0;
try {
int i=0;
XMLInputFactory xif = XMLInputFactory.newInstance();
XMLStreamReader xsr = xif.createXMLStreamReader(new FileReader("D:\\Desktop\\Test\\latestxml\\test.xml"));
xsr.nextTag(); // Advance to statements element
TransformerFactory tf = TransformerFactory.newInstance();
Transformer t = tf.newTransformer();
while(xsr.nextTag() == XMLStreamConstants.START_ELEMENT) {
File file = new File("C:\\Users\\test\\Desktop\\xml\\"+"out" +i+ ".xml");
FileOutputStream fos=new FileOutputStream(file,true);
t.transform(new StAXSource(xsr), new StreamResult(fos));
i++;
}
} catch (Exception e) {
e.printStackTrace();
}
答案 0 :(得分:3)
不要在每次迭代时使用i,当迭代次数达到100或200时,应该使用最新计数进行更新
像:
String outputPath = "/test/path/foo.txt";
while(xsr.nextTag() == XMLStreamConstants.START_ELEMENT) {
FileOutputStream file = new FileOutputStream(outputPath,true);
...
...
count ++;
if(count == 100){
i++;
outputPath = "/test/path/foo"+i+"txt";
count = 0;
}
}
答案 1 :(得分:2)
我希望我做得对,但每次添加一个雇主时你只需要增加计数
File file = new File("out" + i + ".xml");
FileOutputStream fos = new FileOutputStream(file, true);
appendStuff("<Employees>",file);
while (xsr.nextTag() == XMLStreamConstants.START_ELEMENT) {
count++;
t.transform(new StAXSource(xsr), new StreamResult(fos));
if(count == 100) {
count = 0;
i++;
appendStuff("</Employees>",file);
fos.close();
file = new File("out" + i + ".xml");
fos = new FileOutputStream(file, true);
appendStuff("<Employees>",file);
}
}
它不是很好,但你明白了
private static void appendStuff(String content, File file) throws IOException {
FileWriter fw = new FileWriter(file.getAbsoluteFile(),true);
BufferedWriter bw = new BufferedWriter(fw);
bw.write(content);
bw.close();
}