有一个巨大的xml文件(3-4GB)(360000行记录),必须读取每一行并使用Stringbuilder附加每一行。一旦读取,它将被进一步处理。但是由于stringbuilder缓冲区大小超出限制,将无法存储在内部存储器中。如何拆分记录并在缓冲区大小超出之前休息。请提示。
try {
File file = new File("test.txt");
FileReader fileReader = new FileReader(file);
BufferedReader bufferedReader = new BufferedReader(fileReader);
String builder stringBuilder = new Stringbuilder ();
String line;
int count =0;
while ((line = bufferedReader.readLine()) != null)`enter code here`
{
if (line.startswith("<customer>") ){
stringBuilder .append(line);
}
count++;
}
fileReader.close();
System.out.println(stringBuilder .toString());
} catch (IOException e) {
e.printStackTrace();
}
编辑:Asker尝试使用StAX
while (xmlEventReader.hasNext()) {
XMLEvent xmlEvent = null;
try {
xmlEvent = xmlEventReader.nextEvent();
} catch (Exception e) {
e.printStackTrace();
}
if (xmlEvent.isStartElement()) {
StartElement elem = (StartElement) xmlEvent;
if (elem.getName().getLocalPart().equals("<Customer>")) {
if (customerRecord) {
insideChildRecord = true;
}
customerRecord = true;
}
}
if (customerRecord) {
xmlEventWriter.add(xmlEvent);
}
if (xmlEvent.isEndElement()) {
EndElement elem = (EndElement) xmlEvent;
if (elem.getName().getLocalPart().equals("<Customer>")) {
if (insideChildRecord) {
insideChildRecord = false;
} else {
customerRecord = false;
xmlEventWriter.flush();
String cmlChunk = stringWriter.toString()
答案 0 :(得分:3)
您似乎正在解析XML文件(因为我看到您正在检查“
为此,最好使用解析库而不是低级流。由于文件很大,因此我建议为此使用SAX或StAX:https://docs.oracle.com/javase/tutorial/jaxp/stax/index.html
XMLInputFactory xmlInputFactory = XMLInputFactory.newInstance();
XMLEventReader xmlEventReader = xmlInputFactory.createXMLEventReader(new FileInputStream(fileName));
while(xmlEventReader.hasNext()) {
XMLEvent xmlEvent = xmlEventReader.nextEvent();
// parse the XML events one by one
由于您无法将数据存储在内存中,因此必须立即对XML事件进行所有“进一步处理”。
也许这将使使用StAX的方法更加清楚:
XMLInputFactory xmlInputFactory = XMLInputFactory.newInstance();
XMLEventReader xmlEventReader = xmlInputFactory.createXMLEventReader(new FileInputStream("huge-file.xml"));
// this variable is re-used to store the current customer
Customer customer = null;
while (xmlEventReader.hasNext()) {
XMLEvent xmlEvent = xmlEventReader.nextEvent();
if (xmlEvent.isStartElement()) {
StartElement startElement = xmlEvent.asStartElement();
if (startElement.getName().getLocalPart().equalsIgnoreCase("customer")) {
// start populating a new customer
customer = new Customer();
// read an attribute for example <customer number="42">
Attribute attribute = startElement.getAttributeByName(new QName("number"));
if (attribute != null) {
customer.setNumber(attribute.getValue());
}
}
// read a nested element for example:
// <customer>
// <name>John Doe</name>
if(startElement.getName().getLocalPart().equals("name")){
xmlEvent = xmlEventReader.nextEvent();
customer.setName(xmlEvent.asCharacters().getData());
}
}
if (xmlEvent.isEndElement()) {
EndElement endElement = xmlEvent.asEndElement();
if(endElement.getName().getLocalPart().equalsIgnoreCase("customer")){
// all data for the current Customer has been read
// do something with the customer, like logging it or storing it in a database
// after this the customer variable will be re-assigned to the next customer
}
}
}