我是PHP的新手。我想编写代码来查找下面html代码中指定的id
,即1123
。任何人都能给我一些想法吗?
<span class="miniprofile-container /companies/1123?miniprofile="
data-tracking="NUS_CMPY_FOL-nhre"
data-li-getjs="http://s.c.lnkd.licdn.com/scds/concat/common/js?h=dyt8o4nwtaujeutlgncuqe0dn&fc=2">
<strong>
<a href="http://www.linkedin.com/nus-trk?trkact=viewCompanyProfile&pk=biz-overview-public&pp=1&poster=&uid=5674666402166894592&ut=NUS_UNIU_FOLLOW_CMPY&r=&f=0&url=http%3A%2F%2Fwww%2Elinkedin%2Ecom%2Fcompany%2F1123%3Ftrk%3DNUS_CMPY_FOL-nhre&urlhash=7qbc">
Bank of America
</a>
</strong>
</span> has a new Project Manager
注意:我不需要span类中的内容。我需要span类名中的id
。
我尝试了以下内容:
$dom = new DOMDocument('1.0', 'UTF-8');
@$dom->loadHTML($html);
$xmlElements = simplexml_import_dom($dom);
$id = $xmlElements->xpath("//span [@class='miniprofile-container /companies/$data_id?miniprofile=']");
......但我不知道如何继续前进。
答案 0 :(得分:1)
$matches = array();
preg_match('|<span class="miniprofile-container /companies/(\d+)\?miniprofile|', $html, $matches);
print_r($matches);
这是一个非常微不足道的正则表达式,但可以作为第一个建议。如果你想通过DomDocument或simplexml,你不能像你在你的例子中那样混合。 你最喜欢的方式是什么,我们可以缩小它的范围。
//编辑:几乎是@fireeyedboy所说的,但这就是我刚刚摆弄的东西:
<?php
$html = <<<EOD
<html><head></head>
<body>
<span class="miniprofile-container /companies/1123?miniprofile="
data-tracking="NUS_CMPY_FOL-nhre"
data-li-getjs="http://s.c.lnkd.licdn.com/scds/concat/common/js?h=dyt8o4nwtaujeutlgncuqe0dn&fc=2">
<strong>
<a href="#">
Bank of America
</a>
</strong>
</span> has a new Project Manager
</body>
</html>
EOD;
$domDocument = new DOMDocument('1.0', 'UTF-8');
$domDocument->recover = TRUE;
$domDocument->loadHTML($html);
$xPath = new DOMXPath($domDocument);
$relevantElements = $xPath->query('//span[contains(@class, "miniprofile-container")]');
$foundId = NULL;
foreach($relevantElements as $match) {
$pregMatches = array();
if (preg_match('|/companies/(\d+)\?miniprofile|', $match->getAttribute('class'), $pregMatches)) {
if (isset($pregMatches[1])) {
$foundId = $pregMatches[1];
break;
}
};
}
echo $foundId;
?>
答案 1 :(得分:1)
这应该做你想做的事情:
$dom = new DOMDocument('1.0', 'UTF-8');
@$dom->loadHTML( $html );
$xpath = new DOMXPath( $dom );
/*
* the following xpath query will find all class attributes of span elements
* whose class attribute contain the strings " miniprofile-container " and " /companies/"
*/
$nodes = $xpath->query( "//span[contains(concat(' ', @class, ' '), ' miniprofile-container ') and contains(concat(' ', @class, ' '), ' /companies/')]/@class" );
foreach( $nodes as $node )
{
// extract the number found between "/companies/" and "?miniprofile" in the node's nodeValue
preg_match( '#/companies/(\d+)\?miniprofile#', $node->nodeValue, $matches );
var_dump( $matches[ 1 ] );
}