Question

我正在尝试创建英雄联盟新闻的RSS提要，因为他们没有...我正在尝试解析HTML并找到包含某个类属性的所有元素。

这是我所拥有的，但它找不到任何东西。

<?php
    $page = file_get_contents("http://na.leagueoflegends.com/en/news/");
    $dom = new DomDocument();
    $dom->load($page);
    $finder = new DomXPath($dom);
    $classname="node-article";
    $nodes = $finder->query("//*[contains(concat(' ', normalize-space(@class), ' '), ' $classname ')]");
    echo "<pre>" . print_r($nodes, true) . "</pre>";
?>

编辑：工作代码......

<?php
$page = file_get_contents("http://na.leagueoflegends.com/en/news/");
$dom = new DomDocument();
@$dom->loadHTML($page);
$finder = new DomXPath($dom);
$classname = "node-article";
$nodes = $finder->query("//*[contains(concat(' ', normalize-space(@class), ' '), ' $classname ')]");

$articles = array();
foreach ($nodes as $node) {
    $h4 = $node->getElementsByTagName('h4')->item(0);
    $articles[] = array(
        'title' => htmlentities($h4->firstChild->nodeValue),
        'content' => htmlentities($h4->nextSibling->nodeValue),
        'link' => 'http://na.leagueoflegends.com/en/news' . $h4->firstChild->getAttribute('href')
    );
}

echo "<pre>" . print_r($articles, true) . "</pre>";
?>

Answer 1

实际上你需要loadHTML（其中读取包含源的字符串）而不是load（基本上接受文档的路径）。您还使用file_get_contents将整个文件读入字符串。所以你已经有一个包含HTML Source的字符串。

试试这个：

$page = file_get_contents("http://na.leagueoflegends.com/en/news/");
$dom = new DomDocument();
$dom->loadHTML($page);
$finder = new DomXPath($dom);
$classname = "node-article";
$nodes = $finder->query("//*[contains(concat(' ', normalize-space(@class), ' '), ' $classname ')]");
echo "<pre>" . print_r($nodes, true) . "</pre>";

// get title and content of article
$arr = array();

foreach ($nodes as $node) {
    $h4 = $node->getElementsByTagName('h4')->item(0);
    $arr[] = array(
        'title' => $h4->nodeValue,
        'content' => $h4->nextSibling->nodeValue,
    );
}

var_dump($arr); // your title & body content

PHP DOM查找包含类的元素

1 个答案: