向xPath查询传递多条路径

时间:2019-03-14 00:48:14

标签: php

我的链接具有不同的路径,并尝试从这些链接中检索数据。所以我不想分开做。制作一个查询列表,并在该列表上使用foreach。

function passPath($list){
    $list = [
        "//li[@class='out']/a[1]",
        "//ul[@class='ul right_ul clearfix']/li[2]/a",
        "//ul[@class='ul right_ul clearfix']/li[2]/a"
    ];
    foreach($list as $val){
        return $val;
    }
}

然后在DOMXpath的查询中使用该函数。

function getPath($urls){
    foreach($urls as $k => $val){
            $url = $urls;
            $html = content($val);
            $path = new \DOMXPath($html);
            $xPath = passPath($val);
            $route = $path->query($xPath);
            foreach($route as $value){
                if ($value->nodeValue != false) {
                    $urls [] = trim($value->getAttribute('href'));
                    unset($urls[$k]);
                }
            }
    }
    return array_unique($urls);
}

它正在正常运行。但是这里有foreach问题。因为它只是在检索一个元素的数据。不继续其他要素...我在这里缺少什么?

$data = getPath($urls)
var_dump($data)

顺便说一句:content()file_get_content/loadHTML函数。

1 个答案:

答案 0 :(得分:1)

我更改了您的收入列表href的代码。

# You want to parse all pages using url list. So you created function named `getPath($urls)`.
function getPath($urls) {
    # I suggest you'd rather declare $ret for storing values to return.
    $ret = [];

    # Using foreach, you can parse all url.
    foreach ($urls as $k => $url) { # $val is url value of $urls. And I changed $val to $url.

        # content() is file_get_content/loadHTML function.
        $html = content($url);

        # Create new DOMXPath object using $html.
        $path = new \DOMXPath($html);

        # This function is not required.
        # By the way, second element and third element of $xPathList are equal. I think the third element is not required.
        // $xPath = passPath($url);
        $xPathList = [
            "//li[@class='out']/a[1]",
            "//ul[@class='ul right_ul clearfix']/li[2]/a",
            "//ul[@class='ul right_ul clearfix']/li[2]/a"
        ];

        foreach ($xPathList as $xPath) {
            $nodes = $path->query($xPath);
            foreach ($nodes as $node) {
                if ($node->nodeValue != false) {
                    $ret[] = trim($node->getAttribute('href'));
                }
            }
        }
    }

    return array_unique($ret);
}

$data = getPath($urls);
var_dump($data);