StreamException:无效的XML字符(Unicode:0x1a)

时间:2011-12-14 14:07:12

标签: java xml

我正在使用XStream将用户对象保存在文件中。

private void store() {
    XStream xStream = new XStream(new DomDriver("UTF-8"));
    xStream.setMode(XStream.XPATH_ABSOLUTE_REFERENCES);

    xStream.alias("configuration", Configuration.class);
    xStream.alias("user", User.class);

    synchronized (ConfigurationDAOImpl.class) {
        try {
            xStream.toXML(configuration, new FileOutputStream(filename.getFile()));
        } catch (IOException e) {
            throw new RuntimeException("Failed to write to " + filename, e);
        }
    }
}

当我尝试通过以下代码阅读它时,我得到一个例外:com.thoughtworks.xstream.io.StreamException ::在文档的元素内容中找到了无效的XML字符(Unicode:0x1a)。 / p>

private void lazyLoad() {
    synchronized (ConfigurationDAOImpl.class) {
        // Has the configuration been loaded
        if (configuration == null) {
            if (filename.exists()) {
                try {
                    XStream xStream = new XStream(new DomDriver("UTF-8"));
                    xStream.setMode(XStream.XPATH_ABSOLUTE_REFERENCES);

                    xStream.alias("configuration", Configuration.class);
                    xStream.alias("user", User.class);

                    configuration = (Configuration) xStream
                            .fromXML(filename.getInputStream());

                    LOGGER.debug("Loaded configuration from {}.", filename);
                } catch (Exception e) {
                    LOGGER.error("Failed to load configuration.", e);
                }
            } else {
                LOGGER.debug("{} does not exist.", filename);
                LOGGER.debug("Creating blank configuration.");

                configuration = new Configuration();
                configuration.setUsers(new ArrayList<User>());

                // and store it
                store();
            }
        }
    }
}

有什么想法吗?

3 个答案:

答案 0 :(得分:24)

0x1a是无效的xml字符。无法在xml 1.0文档中表示它。

引自http://en.wikipedia.org/wiki/XML#Valid_characters

  

以下范围内的Unicode代码点在XML 1.0中有效   文件:[9] U + 0009,U + 000A,U + 000D:这些是唯一的C0控件   在XML 1.0中被接受; U + 0020-U + D7FF,U + E000-U + FFFD:这不包括一些   (不是全部)BMP中的非字符(所有代理,U + FFFE和U + FFFF)   被禁止); U + 10000-U + 10FFFF:这包括所有代码点   补充飞机,包括非人物。

答案 1 :(得分:5)

我用以下方法用短划线字符(' - ')替换了0x1a:

/**
 * This method ensures that the output String has only
 * @param in the string that has a non valid character.
 * @return the string that is stripped of the non-valid character
 */
private String stripNonValidXMLCharacters(String in) {      
    if (in == null || ("".equals(in))) return null;
    StringBuffer out = new StringBuffer(in);
    for (int i = 0; i < out.length(); i++) {
        if(out.charAt(i) == 0x1a) {
            out.setCharAt(i, '-');
        }
    }
    return out.toString();
}

答案 2 :(得分:0)

如上所述,XML 1.0仅根据this接受一组字符。

这是一个有用的java方法,用于确保字符串符合XML 1.0,它将无效的(它们不仅是0x1a代替)替换为给定的替换。

public static String replaceInvalidXMLCharacters(String input, String replacement) {
        StringBuffer result = new StringBuffer();
        char currentChar;

        if (input == null || "".equals(input)) {
            return "";
        }
        for (int i = 0; i < input.length(); i++) {
            currentChar = input.charAt(i);
            if (currentChar == 0x9 || currentChar == 0xA || currentChar == 0xD || currentChar >= 0x20 && currentChar <= 0xD7FF || currentChar >= 0xE000
                    && currentChar <= 0xFFFD || currentChar >= 0x10000 && currentChar <= 0x10FFFF) {
                result.append(currentChar);
            } else {
                result.append(replacement);
            }
        }
        return result.toString();
    }