从php数组中只过滤重复的url

时间:2017-12-08 12:58:41

标签: php arrays url duplicates filtering

这是一个数组

Array ( 
   [EM Debt] => http://globalevolution.gws.fcnws.com/fs_Overview.html?isin=LU0616502026&culture=en-GB 
   [EM Local Debt] => Will be launched shortly 
   [EM Blended Debt] => Will be launched shortly 
   [Frontier Markets] => http://globalevolution.gws.fcnws.com/fs_Overview.html?isin=LU0501220262 
   [Absolute Return Debt and FX] => Will be launched shortly 
   [Em Debt] => http://globalevolution.gws.fcnws.com/fs_Overview.html?isin=LU0501220262 
) 

如果我使用array_unique(),它也会从数组中过滤Will be launched shortly

我只想过滤重复的网址,而不是文字。

更新:

我需要将数组顺序保持不变,只需过滤重复

5 个答案:

答案 0 :(得分:7)

好吧,你可以使用array_filter

$filtered = array_filter($urls, function ($url) {
    static $used = [];

    if (filter_var($url, FILTER_VALIDATE_URL)) {
        return isset($used[$url]) ? false : $used[$url] = true;
    }

    return true;
});

这是demo

答案 1 :(得分:5)

这是你的答案:

<?php
// taking just example here, replace `$array` with yours
$array = ['http://globalevolution.gws.fcnws.com/fs_Overview.html?isin=LU0616502026&culture=en-GB', 'abc', 'abc', 'http://globalevolution.gws.fcnws.com/fs_Overview.html?isin=LU0616502026&culture=en-GB'];
$url_array = [];
foreach($array as $ele) {
    if(strpos($ele, 'http://') !== false) {
        $url_array[] = $ele;
    } else {
        $string_array[] = $ele;
    }
}

$url_array = array_unique($url_array);
print_r(array_merge($string_array, $url_array));
?>

答案 2 :(得分:5)

您可以遍历数组一次以获得结果,在此过程中,您需要使用额外的数组来指示您在结果中保存了哪个网址。

$saved_urls = [];
$result = [];
foreach($array as $k => $v)
{
    if('http://' == substr(trim($v), 0, 7) || 'https://' == substr(trim($v), 0, 8))
    {
        if(!isset($saved_urls[$v]))    // check if the url have saved
        {
            $result[$k] = $v;
            $saved_urls[$v] = 1;
        }
    }else
        $result[$k] = $v;
}

答案 3 :(得分:3)

好的,我得到了答案

.as-console-wrapper { max-height: 100% !important; top: 0; }

这是数组子排序功能代码。

$urls = ( [EM Debt] => http://globalevolution.gws.fcnws.com/fs_Overview.html?isin=LU0616502026&culture=en-GB 
[EM Local Debt] => Will be launched shortly 
[EM Blended Debt] => Will be launched shortly 
[Frontier Markets] => http://globalevolution.gws.fcnws.com/fs_Overview.html?isin=LU0501220262 [Absolute Return Debt and FX] => Will be launched shortly [Em Debt] => http://globalevolution.gws.fcnws.com/fs_Overview.html?isin=LU0501220262 );

$url_array = [];
foreach($urls as $title => $url) {
    if(strpos($url, 'http://') !== false) {
        $url_array[$title] = $url;
    } else {
        $string_array[$title] = $url;
    }
    $keys[] = $title;
}

$url_array = array_unique($url_array);
$urls = array_merge($url_array, $string_array);
$urls = array_sub_sort($urls, $keys);

答案 4 :(得分:3)

如果要修改输入数组,而不是生成新的已过滤数组,可以使用strpos()标识网址,使用lookup数组来标识重复的网址,并使用unset()修改数组。

  • strpos($v,'http')===0不仅要求http在字符串中,还要求它是字符串中的前四个字符。需要说明的是,这也适用于https。只需检查子串的存在或位置,strstr()substr()的效率始终低于strpos()。 (第二个注释@ PHP Manual's strstr()夸耀了仅在检查子串的存在时使用strpos()的好处。)
  • 使用迭代的in_array()调用来检查$lookup数组,效率低于将重复的url存储为查找数组中的键的效率。 isset()每次都会胜过in_array()。 (Reference Link
  • OP的示例输入并不表示存在任何以http开头但尚未成为网址的令人痛苦的值,也不表示以http开头的非网址。出于这个原因,strpos()是一个合适且轻量级的函数调用。如果麻烦的网址是可能的,那么sevavietl的url验证是一个更可靠的函数调用。 (PHP Manual Link
  • 从我的在线性能测试中,我的答案是发布的最快的方法,它提供了所需的输出数组。

代码:(Demo

$array=[
    'EM Debt'=>'http://globalevolution.gws.fcnws.com/fs_Overview.html?isin=LU0616502026&culture=en-GB',
    'EM Local Debt'=>'Will be launched shortly',
    'EM Blended Debt'=>'Will be launched shortly',
    'Frontier Markets'=>'http://globalevolution.gws.fcnws.com/fs_Overview.html?isin=LU0501220262',
    'Absolute Return Debt and FX'=>'Will be launched shortly',
    'Em Debt'=>'http://globalevolution.gws.fcnws.com/fs_Overview.html?isin=LU0501220262'
];

foreach($array as $k=>$v){
    if(isset($lookup[$v])){          // $v is a duplicate
        unset($array[$k]);           // remove it from $array
    }elseif(strpos($v,'http')===0){  // $v is a url (because starts with http or https)
        $lookup[$v]='';              // store $v in $lookup as a key to an empty string
    }
}
var_export($array);

输出:

array (
  'EM Debt' => 'http://globalevolution.gws.fcnws.com/fs_Overview.html?isin=LU0616502026&culture=en-GB',
  'EM Local Debt' => 'Will be launched shortly',
  'EM Blended Debt' => 'Will be launched shortly',
  'Frontier Markets' => 'http://globalevolution.gws.fcnws.com/fs_Overview.html?isin=LU0501220262',
  'Absolute Return Debt and FX' => 'Will be launched shortly',
)

只是为了好玩,功能/非正统/复杂的方法可能看起来像这样(不推荐,纯粹是演示):

var_export(
    array_intersect_key(
        $array,                                    // use $array to preserve order
        array_merge(                               // combine filtered urls and unfiltered non-urls
            array_unique(                          // remove duplicates
                array_filter($array,function($v){  // generate array of urls
                    return strpos($v,'http')===0;
                })
            ),
            array_filter($array,function($v){  // generate array of non-urls
                return strpos($v,'http')!==0;
            })
        )
    )
);