Xpath中preg_match的错误是什么?未定义的偏移量:1

时间:2015-11-12 14:01:57

标签: php xpath preg-match

我试图从Property ID获取id:使用以下代码:

<?php
$getURL = file_get_contents('http://realestate.com.kh/residential-for-rent-in-phnom-penh-daun-penh-phsar-chas-2-beds-apartment-1001192296/');
$dom = new DOMDocument();
@$dom->loadHTML($getURL);
$xpath = new DOMXPath($dom);

/*echo $xpath->evaluate("normalize-space(substring-before(substring-after(//p[contains(text(),'Property ID:')][1], 'Property ID:'), '–'))");*/

$id = $xpath->evaluate('//div[contains(@class,"property-table")]')->item(0)->nodeValue;
preg_match("/Property ID :(.*)/", $id, $matches);

echo $matches[1];

但它不起作用;

Notice: Undefined offset: 1 in W:\Xampp\htdocs\X\index.php on line 12

有什么问题?如果我像这样创造刺痛

$id ="Property Details Property Type : Apartment Price $ 350 pm Building Size 72 Sqms Property ID : 1001192296";

并在我的代码中替换它的工作原理。那么myselt创建的数据和从xpath中获取的数据有什么区别? 提前感谢您的帮助。

2 个答案:

答案 0 :(得分:1)

您需要检查preg_match()是否确实找到了任何内容。

如果没有结果,则不会有$matches[1]。您应该使用if(count($matches)>1) {... }来解决您遇到的问题。

答案 1 :(得分:1)

您的preg_match()不起作用,因为您获得的xpath中的nodeValue就是这样:

Property Details

                            Property Type : 
                         Apartment 


                    Price
                    $ 350 pm


                Building Size
                72 Sqms


                Property ID 
                 : 
                1001192296

所以你必须这样试试:

$getURL = file_get_contents('http://realestate.com.kh/residential-for-rent-in-phnom-penh-daun-penh-phsar-chas-2-beds-apartment-1001192296/');
$dom = new DOMDocument();
@$dom->loadHTML($getURL);
$xpath = new DOMXPath($dom);

/*echo $xpath->evaluate("normalize-space(substring-before(substring-after(//p[contains(text(),'Property ID:')][1], 'Property ID:'), '–'))");*/

$id = $xpath->evaluate('//div[contains(@class,"property-table")]')->item(0)->nodeValue;

$id = preg_replace('!\s+!', ' ', $id);

preg_match("/Property ID :(.*)/", $id, $matches);

echo $matches[1];

这个($id = preg_replace('!\s+!', ' ', $id);)将所有标签,单词之间的空格组合成一个空格。

<强>更新 由于下面的评论,我现在使用$xpath->evaluate()获取HTML的全文,并尝试匹配所有属性ID(例如只有数字和P位)。

$getURL = file_get_contents('http://realestate.com.kh/residential-for-rent-in-phnom-penh-daun-penh-phsar-chas-2-beds-apartment-1001192296/');

$dom = new DOMDocument();
@$dom->loadHTML($getURL);

$xpath = new DOMXPath($dom);

// this only returns the text of the whole page without html tags
$id = $xpath->evaluate( "//html" )->item(0)->nodeValue;
$id = preg_replace('!\s+!', ' ', $id);

// not a good regex, but matches the property IDs
preg_match_all("/Property ID( |):[ |]((\w{0,1}[-]|)\d*)/", $id, $matches);

// after the changes you have to go for the matches is $matches[2]
foreach( $matches[2] as $property_id ) {
    echo $property_id."<br>";
}