Question

我想使用file_get_contents来抓取p之间的信息标签

我尝试使用DOM元素但是当我使用getElementByTagName并遍历数组时，它会删除<p>标记内的所有HTML。

<?php
function search(){
    $city = $_POST['city'];
    $scity = trim($city);
    $dom = new DOMDocument;
    libxml_use_internal_errors(true);

    $dom->loadHTMLFile('http://'.$scity.'.craigslist.org/search/cto?query=Toyota');
    libxml_use_internal_errors(false);
    $xpath = new DOMXPath($dom);
    $contents = $xpath->query('//p[@class="row"]');

    foreach($contents as $val){
        echo '<a href="' . $val->getElementsByTagName('a')->getAttribute('href') . '">' .  utf8_decode(trim($val->nodeValue, " -,")) . "</a><br />\n";;
    }
}
?>

我希望它像在craigslist上一样显示。标题链接到craigslist和价格，img等...

Answer 1

使用preg_match_all()：

$text = file_get_contents("....");

preg_match_all('|<p>(.*?)</p>|',$texst, $matches); //Use regex to match all inside `p`

var_dump($matches); //View the matches

从PHP中的craigslist中检索帖子

1 个答案: