我想解析以下xml结构:
<?xml version="1.0" encoding="utf-8"?>
<documents>
<document>
<element name="title">
<value><![CDATA[Personnel changes: Müller]]></value>
</element>
</document>
</documents>
为解析此element name="?????
结构,我按以下方式使用XPath:
XPath xPath = XPathFactory.newInstance().newXPath();
String currentString = (String) xPath.evaluate("/documents/document/element[@name='title']/value",pCurrentXMLAsDOM, XPathConstants.STRING);
解析本身工作正常,但德国变音符号(元音)只有一些问题,如“Ü”,“ß”或类似的东西。当我打印出currentString时,String是:
Personnel changes: Müller
但我希望在Xml中使用String:
Personnel changes: Müller
添加:我无法更改xml文件的内容,我必须解析它,就像我得到它一样,所以我必须以正确的方式解析everey String。
答案 0 :(得分:2)
听起来像编码问题。 XML是UTF-8编码的Unicode,您似乎打印编码为ISO-8859-1。检查Java源代码的编码设置。
修改:有关如何设置file.encoding
,请参阅Setting the default Java character encoding?。
答案 1 :(得分:1)
我现在找到了一个好的快速解决方案:
public static String convertXMLToString(File pCurrentXML) {
InputStream is = null;
try {
is = new FileInputStream(pCurrentXML);
} catch (FileNotFoundException e1) {
e1.printStackTrace();
}
String contents = null;
try {
try {
contents = IOUtils.toString(is, "UTF-8");
} catch (IOException e) {
e.printStackTrace();
}
} finally {
IOUtils.closeQuietly(is);
}
return contents;
}
Afterwars我将String转换为DOM对象:
static Document convertStringToXMLDocumentObject(String string) {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = null;
Document document = null;
try {
builder = factory.newDocumentBuilder();
} catch (ParserConfigurationException e) {
e.printStackTrace();
}
try {
document = builder.parse(new InputSource(new StringReader(string)));
} catch (SAXException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
return document;
}
然后我可以用XPath解析DOM,所有元素值都是UTF-8 !! 示范:
currentString = (String) xPath.evaluate("/documents/document/element[@name='title']/value",pCurrentXMLAsDOM, XPathConstants.STRING);
System.out.println(currentString);
输出:
Personnel changes: Müller
:)
答案 2 :(得分:0)
如果您知道文件是utf8编码,请尝试类似:
FileInputStream fis = new FileInputStream("yourfile.xml");
InputStreamReader in = new InputStreamReader(fis, "UTF-8");
InputSource pCurrentXMLAsDOM = new InputSource(in);