Question

我正在读取一个XML配置文件，我不能控制格式，我需要的数据在最后一个元素中。不幸的是，该元素是一个base64编码的序列化Java类（是的，我知道），长度为31200个字符。

一些实验似乎表明，如果我只是将文件读入字符串并将其打印出来，那么Java XML / XPath库不仅不能看到此元素中的值（它们将值静默地设置为空字符串）要控制台，所有内容（甚至下一个行上的结束元素）都会打印出来，但不会打印出来。

最后，如果我手动进入文件并将行分成行，Java可以看到该行，尽管这显然会破坏XML解析和反序列化。它也不实用，因为我想制作一个适用于许多此类文件的工具。

Java中有一些行长度限制可以阻止这种工作吗？我可以使用第三方库解决这个问题吗？

编辑：这里是与XML相关的代码：

FileInputStream fstream = new FileInputStream("path/to/xml/file.xml");
DocumentBuilder db = DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document d = db.parse(fstream);
String s = XPathFactory.newInstance().newXPath().compile("//el1").evaluate(d);

Answer 1

要读取大型xml文件，可以使用SAX解析器。除了阅读＆＃34;字符＆＃34;中的值之外在SAX解析器中应该使用＆＃34; String Buffer＆＃34;而不是String。您可以查看SAX解析器here。

Answer 2

我想知道在读取XML时是否可以对XML进行一些预处理。

我一直在玩，看看我是否可以将长元素分解为子元素列表。然后可以解析它，并且可以将子元素构建回字符串。我的测试提出了这样一个事实，即我对每个子元素的4500个字符的初始猜测对于我的XML解析来说仍然有点高，所以我只是随意选择1000并且它似乎应对了这一点。

无论如何，这可能有所帮助，但可能没有，但这就是我想出的：

private static final String ELEMENT_TO_BREAK_UP_OPEN = "<element>";
private static final String ELEMENT_TO_BREAK_UP_CLOSE = "</element>";
private static final String SUB_ELEMENT_OPEN = "<subelement>";
private static final String SUB_ELEMENT_CLOSE = "</subelement>";
private static final int SUB_ELEMENT_SIZE_LIMIT = 1000;

public static void main(final String[] args) {
    try {

        /* The XML currently looks like this:
         * 
         * <root>
         * <element> ... Super long input with 30000+ characters ... </element>
         * </root>
         * 
         */
        final File file = new File("src\\main\\java\\longxml\\test.xml");
        final BufferedReader reader = new BufferedReader(new FileReader(file));

        final StringBuffer buffer = new StringBuffer();
        String line = reader.readLine();
        while( line != null ) {
            if( line.contains(ELEMENT_TO_BREAK_UP_OPEN) ) {
                buffer.append(ELEMENT_TO_BREAK_UP_OPEN);
                String substring = line.substring(ELEMENT_TO_BREAK_UP_OPEN.length(), (line.length() - ELEMENT_TO_BREAK_UP_CLOSE.length()) );

                while( substring.length() > SUB_ELEMENT_SIZE_LIMIT ) {
                    buffer.append(SUB_ELEMENT_OPEN);
                    buffer.append( substring.substring(0, SUB_ELEMENT_SIZE_LIMIT) );
                    buffer.append(SUB_ELEMENT_CLOSE);

                    substring = substring.substring(SUB_ELEMENT_SIZE_LIMIT);
                }
                if( substring.length() > 0 ) {
                    buffer.append(SUB_ELEMENT_OPEN);
                    buffer.append(substring);
                    buffer.append(SUB_ELEMENT_CLOSE);
                }
                buffer.append(ELEMENT_TO_BREAK_UP_CLOSE);
            }
            else {
                buffer.append(line);
            }

            line = reader.readLine();
        }
        reader.close();


        /* The XML now looks something like this:
         * 
         * <root>
         * <element>
         * <subElement> ... First Part of Data ... </subElement>
         * <subElement> ... Second Part of Data ... </subElement>
         * ... Multiple Other SubElements of Data ..
         * <subElement> ... Final Part of Data ... </subElement>
         * </element>
         * </root>
         */

        //This parses the xml with the new subElements in
        final InputSource src = new InputSource(new StringReader(buffer.toString()));
        final Node document = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(src).getFirstChild();

        //This gives us the first child (element) then that's children (subelements)
        final NodeList childNodes = document.getFirstChild().getChildNodes();

        //Then concatenate them back into a big string.
        final StringBuilder finalElementValue = new StringBuilder();
        for( int i = 0; i < childNodes.getLength(); i++ ) {
            final Node node = childNodes.item(i);
            finalElementValue.append( node.getFirstChild().getNodeValue() );
        }

        //At this point do whatever you need to do. Decode, Deserialize, etc...
        System.out.println(finalElementValue.toString());
    }
    catch (final Exception e) {
        e.printStackTrace();
    }
}

在一般应用方面存在一些问题：

它确实依赖于你想要分解的元素是唯一可识别的。（但我猜测找到元素的逻辑可以改进很多）
它依赖于知道XML的格式并希望不会改变。（仅在后一个解析部分中，一旦将xPath分解为子元素，您可以使用xPath更好地解析它）

说完所有这些后，你最终会得到一个可解析的XML字符串，你可以从中构建你的编码字符串，这样可以帮助你找到解决方案。

读取Java中的文本文件是否具有最大行长度？

2 个答案: