我的链接具有不同的路径,并尝试从这些链接中检索数据。所以我不想分开做。制作一个查询列表,并在该列表上使用foreach。
function passPath($list){
$list = [
"//li[@class='out']/a[1]",
"//ul[@class='ul right_ul clearfix']/li[2]/a",
"//ul[@class='ul right_ul clearfix']/li[2]/a"
];
foreach($list as $val){
return $val;
}
}
然后在DOMXpath的查询中使用该函数。
function getPath($urls){
foreach($urls as $k => $val){
$url = $urls;
$html = content($val);
$path = new \DOMXPath($html);
$xPath = passPath($val);
$route = $path->query($xPath);
foreach($route as $value){
if ($value->nodeValue != false) {
$urls [] = trim($value->getAttribute('href'));
unset($urls[$k]);
}
}
}
return array_unique($urls);
}
它正在正常运行。但是这里有foreach问题。因为它只是在检索一个元素的数据。不继续其他要素...我在这里缺少什么?
$data = getPath($urls)
var_dump($data)
顺便说一句:content()
是file_get_content/loadHTML
函数。
答案 0 :(得分:1)
我更改了您的收入列表href
的代码。
# You want to parse all pages using url list. So you created function named `getPath($urls)`.
function getPath($urls) {
# I suggest you'd rather declare $ret for storing values to return.
$ret = [];
# Using foreach, you can parse all url.
foreach ($urls as $k => $url) { # $val is url value of $urls. And I changed $val to $url.
# content() is file_get_content/loadHTML function.
$html = content($url);
# Create new DOMXPath object using $html.
$path = new \DOMXPath($html);
# This function is not required.
# By the way, second element and third element of $xPathList are equal. I think the third element is not required.
// $xPath = passPath($url);
$xPathList = [
"//li[@class='out']/a[1]",
"//ul[@class='ul right_ul clearfix']/li[2]/a",
"//ul[@class='ul right_ul clearfix']/li[2]/a"
];
foreach ($xPathList as $xPath) {
$nodes = $path->query($xPath);
foreach ($nodes as $node) {
if ($node->nodeValue != false) {
$ret[] = trim($node->getAttribute('href'));
}
}
}
}
return array_unique($ret);
}
$data = getPath($urls);
var_dump($data);