Question

我有很多XML文件，我必须在这些文件中搜索一个字符串（详细说明这将是一个不太复杂的正则表达式）。

结果我想得到字符串所在节点的xpath，即：

pattern = /home|house/

files: file1.xml, file2.xml etc

结果：

"home" in file1.xml, xpath: //root/cars/car[2]
"house" in file2.xml, xpath: //root[1]/elemA[2][@attribute1='first']

我怎样才能做到这一点？我可以使用PHP，python，Javascript，VIM插件（因为我已经使用过这些）

Answer 1

搜索：

 //*[contains('home') or contains('house')]

在PHP中：

使用DOMDocument＆amp; DOMXPath，然后只在结果匹配上调用DOMNode::getNodePath()。

如果您实际上需要一个正则表达式而不是之前的那些匹配项，那么php的DOMDocument只有XPATH 1.0函数，但您可以通过添加用DOMXPath::registerPhpFunctions

的用户定义函数向DOMXPath添加功能

在没有太多错误处理的情况下快速制作一些东西：

function xpathregexmatch($nodelist,$regex){
        foreach($nodelist as $node){
                if( $node instanceof DOMText && preg_match($regex,$node->nodeValue)) return true;
        }
        return false;
}

foreach(glob('*.xml') as $file){
        $d = new DOMDocument();
        $d->load($file);
        $x = new DOMXPath($d);
        $x->registerNamespace("php", "http://php.net/xpath");
        $x->registerPHPFunctions('xpathregexmatch');
        $matches = $x->query('//*[php:function("xpathregexmatch",text(),"/house|home/")]');
        if($matches->length){
                foreach($matches as $node){
                        echo $file. ':'.$node->getNodePath().PHP_EOL;
                }
        }
}

Answer 2

在PHP中：glob XML文件，xpath所有节点，preg_match_all他们的文本以及匹配时，获取节点的xpath getNodePath()并输出：< / p>

$pattern = '/home|house|guide/iu';

foreach (glob('data/*.xml') as $file)
{
    foreach (simplexml_load_file($file)->xpath('//*') as $node)
    {
        if (!preg_match_all($pattern, $node, $matches)) continue;

        printf(
            "\"%s\" in %s, xpath: %s\n", implode('", "', $matches[0]),
            basename($file), dom_import_simplexml($node)->getNodePath()
        );
    }
}

结果（示范）：

"Guide" in iana-charsets-2013-03-05.xml, xpath: /*/*[7]/*[158]/*[4]
"Guide" in iana-charsets-2013-03-05.xml, xpath: /*/*[7]/*[224]/*[2]
"Guide" in iana-charsets-2013-03-05.xml, xpath: /*/*[7]/*[224]/*[4]
"guide" in rdf-dmoz.xml, xpath: /*/*[4]/d:Description
"guide" in rdf-dmoz.xml, xpath: /*/*[5]/d:Description

好问题顺便说一句。

Answer 3

php simplexml：

$xml=simplexml_load_string("file1.xml");
foreach ($xml->cars->car[2] as $car) {
    // do sth with $car
}

更多信息，请更具体地说明您的问题。

从一堆xml文件中的特定正则表达式模式的搜索结果中获取xpath

3 个答案: