使用PHP DOM解析HTML - 类包含文本

时间:2013-04-09 12:13:05

标签: php html dom xpath domxpath

我有一组要解析的html项目。我需要解析其类名以'uid-g-uid'结尾的div的内容。以下是示例div ...

<div class="uid-g-uid">1121</div>

<div class="yskisghuid-g-uid">14234</div>

<div class="kif893jduid-g-uid">114235</div>

我尝试了以下组合,但没有工作

$doc = new DOMDocument();
$bdy = 'HTML Content goes here...';
@$doc->loadHTML($bdy);
$xpath = new DomXpath($doc);
$div = $xpath->query('//*[@class=ends-with(., "uid-g-uid")]');

并尝试了

$doc = new DOMDocument();
$bdy = 'HTML Content goes here...';
@$doc->loadHTML($bdy);
$xpath = new DomXpath($doc);
$div = $xpath->query('//*[@class="*uid-g-uid"]');

请帮忙!

4 个答案:

答案 0 :(得分:3)

ends-with()需要Xpath 2.0,因此它不适用于Xpath 1.0的DOMXPath。 这样的事情应该有效:

$xpath->query('//*["uid-g-uid" = substring(@class, string-length(@class) - 8)]');

答案 1 :(得分:2)

您想要执行XPath 1.0查询,以检查以特定字符串结尾的字符串。 ends-with()字符串函数在该版本中不可用。

我可以看到多种方法来做到这一点。在你的情况下,子串总是只在那里一次,如果那么最后你可以使用contains()

//*[contains(@class, "uid-g-uid")]

如果子串也可以在那里的某个其他位置而你不喜欢它,那么检查它是否在最后:

//*[contains(@class, "uid-g-uid") and substring-after(@class, "uid-g-uid") = ""]

如果它可以在那里多次,那么这也不会有效。在这种情况下,你可以检查字符串是否与它结束:

//@class[substring(., string-length(.) - 8, 9) = "uid-g-uid"]/..

哪个可能是最直接的变体,或者,因为substring()的第三个参数是可选的比较直到结束:

//@class[substring(., string-length(.) - 8) = "uid-g-uid"]/..

答案 2 :(得分:2)

由于您正在寻找XPath 1.0中不可用的XPath函数,我认为您可以使用PHP提供的DOMXPath::registerPhpFunctions功能来为您的XPath查询调用任何PHP函数。有了这个,您甚至可以像这样调用preg_match函数:

$html = <<< EOF
<div class="uid-g-uid">1121</div>
<div class="yskisghuid-g-uid">14234</div>
<div class="kif893jduid-g-uid">114235</div>
EOF;
$doc = new DOMDocument();
libxml_use_internal_errors(true);
$doc->loadHTML($html); // loads your html
$xpath = new DOMXPath($doc);

// Register the php: namespace (required)
$xpath->registerNamespace("php", "http://php.net/xpath");

// Register PHP preg_match function
$xpath->registerPHPFunctions('preg_match');

// call PHP preg_match function on your xpath to make sure class ends
// with the string "uid-g-uid" using regex "/uid-g-uid$/"
$nlist = $xpath->evaluate('//div[php:functionString("preg_match",
                           "/uid-g-uid$/", @class) = 1]/text()');

$numnodes = $nlist->length; // no of divs matched
for($i=0; $i < $numnodes; $i++) { // run the loop on matched divs
   $node = $nlist->item($i);
   echo "val: " . $node->nodeValue . "\n";
}

答案 3 :(得分:1)

试试这个:

#/ First regex and replace your class with findable flag
$bdy = preg_replace('/class=\".*?uid-g-uid\"/ims', 'class="__FINDME__"', $bdy);

#/ Now find the new flag name instead
$dom = new DOMDocument();
@$dom->loadHTML($bdy);
$xpath = new DOMXPath($dom);

$divs = $xpath->evaluate("//div[@class = '__FINDME__']");
var_dump($divs->length); die(); //check if length is >=1. else we have issue.

for($j=0; $j<$divs->length; $j++)
{
    $div = $divs->item($j);
    $div_value = $div->nodeValue;
    .  
    .  
    .  
}