我正在尝试解析我从Google Geocode Api获得的xml文档。
我的XML文件。我在同一个文件中有一系列这样的数据。这只是一个节点
<?xml version="1.0" encoding="UTF-8"?>
<GeocodeResponse>
<status>OK</status>
<result>
<formatted_address>Petroleum House, Jamshedji Tata Road, Churchgate, Mumbai, Maharashtra 400020, India</formatted_address>
<address_component>
<long_name>Petroleum House</long_name>
<short_name>Petroleum House</short_name>
</address_component>
<address_component>
<long_name>Jamshedji Tata Road</long_name>
<short_name>Jamshedji Tata Road</short_name>
<type>route</type>
</address_component>
<address_component>
<long_name>Churchgate</long_name>
<short_name>Churchgate</short_name>
<type>sublocality</type>
<type>political</type>
</address_component>
<address_component>
<long_name>Mumbai</long_name>
<short_name>मॿंबई</short_name>
<type>locality</type>
<type>political</type>
</address_component>
<address_component>
<long_name>Mumbai</long_name>
<short_name>Mumbai</short_name>
<type>administrative_area_level_2</type>
<type>political</type>
</address_component>
<address_component>
<long_name>Maharashtra</long_name>
<short_name>MH</short_name>
<type>administrative_area_level_1</type>
<type>political</type>
</address_component>
<address_component>
<long_name>India</long_name>
<short_name>IN</short_name>
<type>country</type>
<type>political</type>
</address_component>
<address_component>
<long_name>400020</long_name>
<short_name>400020</short_name>
<type>postal_code</type>
</address_component>
<geometry>
<location>
<lat>18.9291061</lat>
<lng>72.8255146</lng>
</location>
<location_type>APPROXIMATE</location_type>
<viewport>
<southwest>
<lat>18.9277189</lat>
<lng>72.8240293</lng>
</southwest>
<northeast>
<lat>18.9304168</lat>
<lng>72.8267272</lng>
</northeast>
</viewport>
<bounds>
<southwest>
<lat>18.9288559</lat>
<lng>72.8251686</lng>
</southwest>
<northeast>
<lat>18.9292798</lat>
<lng>72.8255879</lng>
</northeast>
</bounds>
</geometry>
</result>
</GeocodeResponse>
我正在尝试使用以下代码但是我收到了一些错误。这是我第一次尝试解析XML。
import java.io.File;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
public class parser {
public static void main(String args[]) {
try {
File stocks = new File("filename.xml");
DocumentBuilderFactory dbFactory = DocumentBuilderFactory
.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(stocks);
doc.getDocumentElement().normalize();
System.out.println("root of xml file"
+ doc.getDocumentElement().getNodeName());
NodeList nodes = doc.getElementsByTagName("address_component");
System.out.println("==========================");
for (int i = 0; i < nodes.getLength(); i++) {
Node node = nodes.item(i);
if (node.getNodeType() == Node.ELEMENT_NODE) {
Element element = (Element) node;
System.out.println("Name: "
+ getValue("long_name", element));
System.out.println("lat: " + getValue("lat", element));
System.out.println("lon: " + getValue("lon", element));
}
}
} catch (Exception ex) {
ex.printStackTrace();
}
}
private static String getValue(String tag, Element element) {
NodeList nodes = element.getElementsByTagName(tag).item(0)
.getChildNodes();
Node node = (Node) nodes.item(0);
return node.getNodeValue();
}
``}
我得到的错误
com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException: Invalid byte 3 of 3-byte UTF-8 sequence.
at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.invalidByte(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.read(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.load(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.scanContent(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(Unknown Source)
at javax.xml.parsers.DocumentBuilder.parse(Unknown Source)
at parser.main(parser.java:17)
Google的直接输出
<?xml version="1.0" encoding="UTF-8"?>
<GeocodeResponse>
<status>OK</status>
<result>
<formatted_address>Petroleum House, Jamshedji Tata Road, Churchgate, Mumbai, Maharashtra 400020, India</formatted_address>
<address_component>
<long_name>Petroleum House</long_name>
<short_name>Petroleum House</short_name>
</address_component>
<address_component>
<long_name>Jamshedji Tata Road</long_name>
<short_name>Jamshedji Tata Road</short_name>
<type>route</type>
</address_component>
<address_component>
<long_name>Churchgate</long_name>
<short_name>Churchgate</short_name>
<type>sublocality</type>
<type>political</type>
</address_component>
<address_component>
<long_name>Mumbai</long_name>
<short_name>म�ंबई</short_name>
<type>locality</type>
<type>political</type>
</address_component>
<address_component>
<long_name>Mumbai</long_name>
<short_name>Mumbai</short_name>
<type>administrative_area_level_2</type>
<type>political</type>
</address_component>
<address_component>
<long_name>Maharashtra</long_name>
<short_name>MH</short_name>
<type>administrative_area_level_1</type>
<type>political</type>
</address_component>
<address_component>
<long_name>India</long_name>
<short_name>IN</short_name>
<type>country</type>
<type>political</type>
</address_component>
<address_component>
<long_name>400020</long_name>
<short_name>400020</short_name>
<type>postal_code</type>
</address_component>
<geometry>
<location>
<lat>18.9291061</lat>
<lng>72.8255146</lng>
</location>
<location_type>APPROXIMATE</location_type>
<viewport>
<southwest>
<lat>18.9277189</lat>
<lng>72.8240293</lng>
</southwest>
<northeast>
<lat>18.9304168</lat>
<lng>72.8267272</lng>
</northeast>
</viewport>
<bounds>
<southwest>
<lat>18.9288559</lat>
<lng>72.8251686</lng>
</southwest>
<northeast>
<lat>18.9292798</lat>
<lng>72.8255879</lng>
</northeast>
</bounds>
</geometry>
</result>
</GeocodeResponse>
这是谷歌的直接输出
答案 0 :(得分:3)
我怀疑该文件在被保存时被错误编码。
您的文件顶部显示UTF-8,但无论保存什么都没有将其保存为UTF-8。您应该能够通过另一个支持XML的工具查看,例如浏览器或命令行工具,例如XMLStarlet。
您可以直接从Google服务获得该输入吗?即不要将其保存为中间文件。如果只是为了确认这个问题,那将是值得做的。
答案 1 :(得分:1)
我想说它与文件编码有关。 如果你在Windows机器上它可以将xml文件转换为windows ISO格式而不是UTF-8
我会尝试替换
Document doc = dBuilder.parse(stocks);
使用:
Document doc = dBuilder.parse(new FileInputStream(stocks), "UTF8")));
确保输入文件读为UTF-8
编辑: 如何使用notepad ++检查文件编码
答案 2 :(得分:0)
您可以尝试解析您的文件:
File file = new File("filename.xml");
InputStream inputStream= new FileInputStream(file);
Reader reader = new InputStreamReader(inputStream,"UTF-8");
InputSource is = new InputSource(reader);
is.setEncoding("UTF-8");
Document doc = dBuilder.parse(is);
这只是一个疯狂的猜测......