在节点中查找关键字并在DOM中获取节点名称

时间:2011-11-22 07:08:14

标签: java regex dom nodes siblings

我想在DOM中搜索特定的关键字,当找到它时,我想知道它来自树中的哪个节点。

static void search(String segment, String keyword) {

    if (segment == null)
        return;

    Pattern p=Pattern.compile(keyword,Pattern.CASE_INSENSITIVE);
    StringBuffer test=new StringBuffer (segment);
    matcher=p.matcher(test);

    if(!matcher.hitEnd()){        
        total++;
        if(matcher.find())
        //what to do here to get the node?
    }
}

public static void traverse(Node node) {
    if (node == null || node.getNodeName() == null)
        return;

    search(node.getNodeValue(), "java");

    check(node.getFirstChild());

    System.out.println(node.getNodeValue() != null && 
                       node.getNodeValue().trim().length() == 0 ? "" : node);
    check(node.getNextSibling());
}

1 个答案:

答案 0 :(得分:3)

考虑使用XPathAPI):

// the XML & search term
String xml = "<foo>" + "<bar>" + "xml java xpath" + "</bar>" + "</foo>";
InputSource src = new InputSource(new StringReader(xml));
final String term = "java";
// search expression and term variable resolver
String expression = "//*[contains(text(),$term)]";
final QName termVariableName = new QName("term");
class TermResolver implements XPathVariableResolver {
  @Override
  public Object resolveVariable(QName variableName) {
    return termVariableName.equals(variableName) ? term : null;
  }
}
// perform the search
XPath xpath = XPathFactory.newInstance().newXPath();
xpath.setXPathVariableResolver(new TermResolver());
Node node = (Node) xpath.evaluate(expression, src, XPathConstants.NODE);

如果您想通过正则表达式进行更复杂的匹配,可以提供自己的function resolver

XPath表达式//*[contains(text(),$term)]的细分:

  • //*星号选择任何元素;双斜杠表示任何父级
  • [contains(text(),$term)]是与文字匹配的谓词
  • text()是一个获取元素文本的函数
  • $term是一个变量;这可以用来通过变量解析器解决术语“java”;一个解析器首选字符串连接以防止注入攻击(类似于SQL注入问题)
  • contains(arg1,arg2)是一个函数,如果arg1包含arg2
  • ,则返回true

XPathConstants.NODE告诉API选择单个节点;您可以使用NODESET将所有匹配作为NodeList