比较url数组并从一个数组中删除url依赖于第二个数组

时间:2017-09-10 17:15:22

标签: php arrays url compare

我需要一些帮助,我想我应该做些什么。我有两个带网址的数组:

$urls = ['https://test.com/', 'http://example.com/', 'https://google.com/'];

$urlsFromOtherSource = ['https://test.com/', 'https://example.com/', 'https://facebook.com/'];

我需要在那里创建三个网址数组。其中第一个将具有来自两个阵列的共同URL。另外两个将是相同的,如果在这两个首字母数组中我有相同的网址,但区别仅在于http - https,我需要将此网址只分配给一个数组。

所以从我的例子中我需要以两种方式获取数组:

 $commonUrls = ['https://test.com/']; //becouse i have only this url in two arrays


 $urls = ['http://example.com/', 'https://google.com/'];   //'http://example.com/ I leave in this array this url and remove from second table becouse in second array i have the same- difference is only in https


  $urlsFromOtherSource = ['https://facebook.com/']; //remove from this array https://example.com/ becouse this url is in first array- difference is only in http

我试着想一下如何比较这些数组并捕捉http-https的差异,但这对我来说并不容易。我的代码看起来像这样:

  $urls = ['https://test.com/', 'http://example.com/', 'https://google.com/'];

$urlsFromOtherSource = ['https://test.com/', 'https://example.com/', 'https://facebook.com/'];

        $commonUrls = array_intersect($urls, $urlsFromOtherSource);//here I have common urls from both arrays
        $urls = array_diff($urls, $commonUrls);//I remove from this array urls which i have in common array
        $urlsFromOtherSource = array_diff($urlsFromOtherSource, $commonUrls);//I remove from this array urls which i have in common array


        foreach ($urlsFromOtherSource as $url) {
            $landingPageArray[] = preg_replace(["#^http(s)?://#", "#^www\.#"], ["", ""], $url);
        }

        foreach ($urls as $url) {
            $landingPage = preg_replace(["#^http(s)?://#", "#^www\.#"], ["", ""], $url);
            if (in_array($landingPage, $landingPageArray)) {
                $httpDifference[] = $url;
            }
        }
        //I havent idea how can I remove from $urlsFromOtherSource urls which I have in $urls array and where difference is only in http-https
        $urlsFromOtherSource = array_diff($urlsFromOtherSource, $httpDifference);

所以我需要的是比较数组并从第一个数组中的第二个数组网址中删除,这个网址之间的差异只是http-htpps。也许有人可以帮我找到一些算法。

UPDATE 如果我在commonUrls中有这个URL,我还需要从urlsFromOtherSource中删除:

commonUrls: array(1) {
  [0]=>
  string(17) "http://www.test.com/"
}



urlsFromOtherSource: array(1) {
  [2]=>
  string(21) "http://test.com/"
}

所以我需要从urlsFromOtherSource中删除此URL。并使此代码自动仅比较着陆页,无论它是http://www还是www或仅http://我不需要在我的数组中对此进行比较

1 个答案:

答案 0 :(得分:2)

您可以使用u方法编写自己的比较函数,例如array_udiffarray_uintersect。比较网址时使用preg_replace忽略与http / https的区别。

$commonUrls = array_intersect($urls, $urlsFromOtherSource);//here I have common urls from both arrays

$urls = array_diff($urls, $commonUrls);

$urlsFromOtherSource = array_udiff(array_diff($urlsFromOtherSource, $commonUrls), $urls, function ($a, $b) {
  return strcmp(preg_replace('|^https?://(www\\.)?|', '', $a), preg_replace('|^https?://(www\\.)?|', '', $b));
});

这会产生:

commonUrls: array(1) {
  [0]=>
  string(17) "https://test.com/"
}

urls: array(2) {
  [1]=>
  string(19) "http://example.com/"
  [2]=>
  string(19) "https://google.com/"
}

urlsFromOtherSource: array(1) {
  [2]=>
  string(21) "https://facebook.com/"
}