Question

我有以下（非常大=＆gt; 5GB）XML：

<Hotels>
  <Hotel>
    <Name>Hotel 1</Name>
    <City>City 1</City>
    <Phone>12345</Phone>
  </Hotel>
  <Hotel>
    <Name>Hotel 2</Name>
    <City>City 2</City>
    <Phone>67890</Phone>
  </Hotel>
  ...
</Hotels>

我有一个文件，它定义了我想要提取的字段以及它们的路径：

$root = "/Hotels/Hotel";
$fields = array("HotelName"   => "/Name",
                "PhoneNumber" => "/Phone");

因此HotelName的路径为：/Hotels/Hotel/Name。

现在我想获取每家酒店的信息。我无法为它们创建类（如here），因为脚本必须是动态的，并且将传递具有不同定义文件的不同XML文件。

如何通过使用路径，没有类和内存使用率较低（=＆gt;大文件）来解决这个问题？

//编辑：一切都已实施。我只需要一种方法来遍历Hotel并使用我拥有的路径获取它们的值。

Answer 1

尝试阅读本教程有一些解释和示例。 http://viralpatel.net/blogs/java-xml-xpath-tutorial-parse-xml/

对于你的porpuse，你应该使用来自Stax familiy的东西，而不是DOM。

尝试这样做

public class QueryXML {
  public void query() throws ParserConfigurationException, SAXException,
      IOException, XPathExpressionException {
    // standard for reading an XML file
    DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
    factory.setNamespaceAware(true);
    DocumentBuilder builder;
    Document doc = null;
    XPathExpression expr = null;
    builder = factory.newDocumentBuilder();
    doc = builder.parse("person.xml");

    // create an XPathFactory
    XPathFactory xFactory = XPathFactory.newInstance();

    // create an XPath object
    XPath xpath = xFactory.newXPath();

    // compile the XPath expression
    expr = xpath.compile("//person[firstname='Lars']/lastname/text()");
    // run the query and get a nodeset
    Object result = expr.evaluate(doc, XPathConstants.NODESET);

    // cast the result to a DOM NodeList
    NodeList nodes = (NodeList) result;
    for (int i=0; i<nodes.getLength();i++){
      System.out.println(nodes.item(i).getNodeValue());
    }

    // new XPath expression to get the number of people with name Lars
    expr = xpath.compile("count(//person[firstname='Lars'])");
    // run the query and get the number of nodes
    Double number = (Double) expr.evaluate(doc, XPathConstants.NUMBER);
    System.out.println("Number of objects " +number);

    // do we have more than 2 people with name Lars?
    expr = xpath.compile("count(//person[firstname='Lars']) >2");
    // run the query and get the number of nodes
    Boolean check = (Boolean) expr.evaluate(doc, XPathConstants.BOOLEAN);
    System.out.println(check);
  }

您可以根据需要调整代码。

Answer 2

如果您已找到<Hotel/>节点并将其作为DOM引用，则只需访问其子节点（以酒店为上下文）。使用

XPath：./Name或更短Name（只是不要使用引用根的/启动它，但请确保将酒店节点用作查询上下文;或
DOM：hotel.getChildNodes()，然后遍历结果集，比较元素名称以查找相应的子节点。

按路径获取XML节点

2 个答案: