我希望$allUrls
数组不包含重复项
我试图这样做:
for( $i = 0; $i <= 2; $i++ ) {
$html = file_get_contents("http://www.keurmerk.info/Leden_Partners?s=&c=0&Page=".$i."");
$pattern = "/(http(s)?:\/\/)?(w{3}\.)(\w+\.)([a-zA-Z]{2,6})(\/\w*)?/";
preg_match_all( $pattern, $html, $urls );
$allUrls[0][] = array_unique($urls[0]);
}
foreach ( $allUrls[0] as $url ) {
var_dump($url);
}
然而,这似乎不起作用,它$allUrls
数组仍然有重复。
有人能告诉我这里我做错了吗?
var_dump输出:
array(22) {
[0]=> string(29) "http://www.bootaccessoires.nl"
[1]=> string(26) "http://www.cookingforme.nl"
[2]=> string(24) "http://www.hoorbellen.nl"
[3]=> string(33) "http://www.100procentsportief.nl/"
[4]=> string(27) "http://www.1000en1smaken.nl"
[5]=> string(33) "http://www.1001kerstpakketten.com"
[6]=> string(35) "http://www.1001wellnesspakketten.nl"
[7]=> string(25) "http://www.100parfums.nl/"
[8]=> string(30) "http://www.101brandblussers.nl"
[9]=> string(20) "http://www.10sign.nl"
[10]=> string(25) "http://www.123envelop.com"
[11]=> string(31) "http://www.123bloeddrukmeter.nl"
[12]=> string(21) "http://www.123Body.nl"
[13]=> string(29) "http://www.123damesfietsen.nl"
[14]=> string(28) "http://www.123drogisterij.nl"
[15]=> string(26) "http://www.123drukwerk.com"
[16]=> string(31) "http://www.123erotiekwinkel.com"
[17]=> string(30) "http://www.123feestpruiken.nl/"
[18]=> string(29) "http://www.123herenfietsen.nl"
[19]=> string(21) "http://www.123hout.nl"
[20]=> string(21) "http://www.123inkt.nl"
[21]=> string(31) "https://www.extremetracking.com"
} array(22) {
[0]=> string(31) "http://www.schoonheidswinkel.nl"
[1]=> string(34) "http://www.winkelvandenostalgie.nl"
[2]=> string(25) "http://www.misteragri.com"
[3]=> string(30) "http://www.123kinderfietsen.nl"
[4]=> string(25) "http://www.123ledspots.nl"
[5]=> string(28) "http://www.123mijngordijn.nl"
[6]=> string(24) "http://www.123soatest.nl"
[7]=> string(29) "http://www.123sportfietsen.nl"
[8]=> string(27) "http://www.123superfoods.nl"
[9]=> string(25) "http://www.123telefoon.nl"
[10]=> string(25) "http://www.123tuinleds.nl"
[11]=> string(28) "http://www.123voetmassage.nl"
[12]=> string(21) "http://www.12cook.com"
[13]=> string(23) "http://www.1gameshop.be"
[14]=> string(23) "http://www.24parfums.nl"
[15]=> string(27) "http://www.2wielerwinkel.nl"
[16]=> string(25) "http://www.4activekidz.nl"
[17]=> string(25) "http://www.4kidsathome.nl"
[18]=> string(28) "http://www.4kidsnederland.nl"
[19]=> string(24) "http://www.4moregames.nl"
[20]=> string(23) "http://www.4sporters.nl"
[21]=> string(31) "https://www.extremetracking.com"
} array(19) {
[0]=> string(27) "http://www.springtouwen.nl/"
[1]=> string(29) "http://www.vibiemmewebshop.nl"
[2]=> string(24) "http://www.slimestore.nl"
[3]=> string(24) "http://www.4yoursport.nl"
[4]=> string(22) "http://www.4youwear.nl"
[5]=> string(18) "http://www.6566.eu"
[6]=> string(23) "http://www.aadenwijn.nl"
[7]=> string(21) "http://www.aagifts.nl"
[8]=> string(27) "http://www.aanhangershop.nl"
[9]=> string(32) "http://www.aanhangwagendirect.nl"
[10]=> string(30) "http://www.aannemerskorting.nl"
[11]=> string(23) "http://www.abcoparts.nl"
[12]=> string(24) "http://www.aboutshoes.nl"
[13]=> string(25) "http://www.accudienst.nl/"
[14]=> string(25) "http://www.acculaptop.com"
[15]=> string(32) "http://www.accuserviceholland.nl"
[16]=> string(22) "http://www.accushop.nl"
[17]=> string(21) "http://www.accuweb.nl"
[18]=> string(31) "https://www.extremetracking.com"
}
https://www.extremetracking.com
在var_dump结果中出现三次。
答案 0 :(得分:1)
如果每个页面/已经/要分开,你可以做这样的事情,这将删除第2页,第1页和第3页的第1页上存在的任何内容,等等:
$allUrls = array(0 => array());
for( $i = 0; $i <= 2; $i++ ) {
$html = file_get_contents("http://www.keurmerk.info/Leden_Partners?s=&c=0&Page=".$i."");
$pattern = "/(http(s)?:\/\/)?(w{3}\.)(\w+\.)([a-zA-Z]{2,6})(\/\w*)?/";
preg_match_all( $pattern, $html, $urls );
$allUrls[0][] = array_unique(array_filter($urls[0], function($url) use($allUrls) {
foreach ($allUrls[0] as $all) {
if (array_search($url, $all) !== false) {
return false;
}
}
return true;
}));
}
如果它们都可以合并,可以这样做:
$allUrls = array(0 => array());
for( $i = 0; $i <= 2; $i++ ) {
$html = file_get_contents("http://www.keurmerk.info/Leden_Partners?s=&c=0&Page=".$i."");
$pattern = "/(http(s)?:\/\/)?(w{3}\.)(\w+\.)([a-zA-Z]{2,6})(\/\w*)?/";
preg_match_all( $pattern, $html, $urls );
$allUrls[0] = array_merge($allUrls[0], $urls[0]);
}
$allUrls[0] = array_unique($allUrls[0]);