如何在类中获取文本的json数组?

时间:2014-05-08 12:42:29

标签: php json

网址中的html代码段(www.foo.com/index.html):

...
<th class="name" align="left" scope="col">
<a class="foo" href="foo.html">foo</a>
</th>
...
<th class="name" align="left" scope="col">
<a class="bar" href="bar.html">bar</a>
</th>
...
<th class="name" align="left" scope="col">
<a class="ba" href="baz.html">baz</a>
</th>
......

我想通过php获取类.name中的所有文本并将其转换为JSON

所以最终结果如下:

{"names":["foo","bar","baz"]}

这就是我的尝试:

function linkExtractor($html){
    $nameArr = array();
    $doc = new DOMDocument();
    $doc->loadHTML($html);
    $names = //how do i get the elements?
    foreach($names as $name) {
        array_push($nameArr, $name);
    }
    return $imageArr;
}

echo json_encode(array("names" => linkExtractor($html)));

2 个答案:

答案 0 :(得分:2)

试试这个......

$html = "http://www.foo.com/index.html"; //is this right?
function linkExtractor($html, $classname){
    $nameArr = array();
    $doc = new DOMDocument();
    $doc->loadHTML($html);

    $names = $doc->xpath("//*[@class='" . $classname . "']");

    foreach($names as $name) {
        array_push($nameArr, $name);
    }
    return $imageArr;
}

echo json_encode(array("names" => linkExtractor($html,".name")));

答案 1 :(得分:0)

所以这就结束了:

$names = function($html) {
    $doc  = new DOMDocument();
    $last = libxml_use_internal_errors(TRUE);
    $doc->loadHTML($html);
    libxml_use_internal_errors($last);
    $xp     = new DOMXPath($doc);
    $result = array();
    foreach ($xp->query("//*[contains(concat(' ', normalize-space(@class), ' '), ' name ')]") as $node)
        $result[trim($node->textContent)] = 1;
    return array_keys($result);
};

echo json_encode(array("names" => $names($html)));

输出:

{"names":["foo","bar","baz"]}

必需的PHP版本:5.3 +