在Java中使用Stringbuilder读取巨大的文本文件并追加

时间:2018-08-15 08:07:50

标签: java file

有一个巨大的xml文件(3-4GB)(360000行记录),必须读取每一行并使用Stringbuilder附加每一行。一旦读取,它将被进一步处理。但是由于stringbuilder缓冲区大小超出限制,将无法存储在内部存储器中。如何拆分记录并在缓冲区大小超出之前休息。请提示。

        try {
        File file = new File("test.txt");
        FileReader fileReader = new FileReader(file);
        BufferedReader bufferedReader = new BufferedReader(fileReader);
        String builder stringBuilder = new Stringbuilder ();
        String line;
         int count =0;
        while ((line = bufferedReader.readLine()) != null)`enter code here` 
         {
            if (line.startswith("<customer>") ){
              stringBuilder .append(line);
            }     
            count++;    
        }
        fileReader.close();
        System.out.println(stringBuilder .toString());
    } catch (IOException e) {
        e.printStackTrace();
    }

编辑:Asker尝试使用StAX

 while (xmlEventReader.hasNext()) {
        XMLEvent xmlEvent = null;
        try {
            xmlEvent = xmlEventReader.nextEvent();
        } catch (Exception e) {
            e.printStackTrace();
        }
        if (xmlEvent.isStartElement()) {
            StartElement elem = (StartElement) xmlEvent;
            if (elem.getName().getLocalPart().equals("<Customer>")) {
                if (customerRecord) {
                    insideChildRecord = true;
                }
                customerRecord = true;
            }
        }
        if (customerRecord) {
            xmlEventWriter.add(xmlEvent);
        }
        if (xmlEvent.isEndElement()) {
            EndElement elem = (EndElement) xmlEvent;
            if (elem.getName().getLocalPart().equals("<Customer>")) {
                if (insideChildRecord) {
                    insideChildRecord = false;
                } else {
                    customerRecord = false;
                    xmlEventWriter.flush();
                    String cmlChunk = stringWriter.toString()

1 个答案:

答案 0 :(得分:3)

您似乎正在解析XML文件(因为我看到您正在检查“ ”)。

为此,最好使用解析库而不是低级流。由于文件很大,因此我建议为此使用SAX或StAX:https://docs.oracle.com/javase/tutorial/jaxp/stax/index.html

XMLInputFactory xmlInputFactory = XMLInputFactory.newInstance();
XMLEventReader xmlEventReader = xmlInputFactory.createXMLEventReader(new FileInputStream(fileName));
while(xmlEventReader.hasNext()) {
    XMLEvent xmlEvent = xmlEventReader.nextEvent();
    // parse the XML events one by one

由于您无法将数据存储在内存中,因此必须立即对XML事件进行所有“进一步处理”。

也许这将使使用StAX的方法更加清楚:

    XMLInputFactory xmlInputFactory = XMLInputFactory.newInstance();
    XMLEventReader xmlEventReader = xmlInputFactory.createXMLEventReader(new FileInputStream("huge-file.xml"));

    // this variable is re-used to store the current customer
    Customer customer = null;

    while (xmlEventReader.hasNext()) {

        XMLEvent xmlEvent = xmlEventReader.nextEvent();
        if (xmlEvent.isStartElement()) {

            StartElement startElement = xmlEvent.asStartElement();

            if (startElement.getName().getLocalPart().equalsIgnoreCase("customer")) {
                // start populating a new customer
                customer = new Customer();

                // read an attribute for example <customer number="42">
                Attribute attribute = startElement.getAttributeByName(new QName("number"));
                if (attribute != null) {
                    customer.setNumber(attribute.getValue());
                }
            }

            // read a nested element for example:
            // <customer>
            //    <name>John Doe</name>
            if(startElement.getName().getLocalPart().equals("name")){
                xmlEvent = xmlEventReader.nextEvent();
                customer.setName(xmlEvent.asCharacters().getData());
            }
        }

        if (xmlEvent.isEndElement()) {
            EndElement endElement = xmlEvent.asEndElement();
            if(endElement.getName().getLocalPart().equalsIgnoreCase("customer")){
                // all data for the current Customer has been read
                // do something with the customer, like logging it or storing it in a database
                // after this the customer variable will be re-assigned to the next customer
            }
        }
    }