PHP Array过滤正则表达式

时间:2010-09-25 19:10:54

标签: php regex arrays filter

大家好我有一个如下所示的阵列

Array
(
    [0] => http://api.tweetmeme.com/imagebutton.gif?url=http://mashable.com/2010/09/25/trailmeme/ 
    [1] => http://cdn.mashable.com/wp-content/plugins/wp-digg-this/i/gbuzz-feed.png 
    [2] => http://mashable.com/wp-content/plugins/wp-digg-this/i/fb.jpg 
    [3] => http://mashable.com/wp-content/plugins/wp-digg-this/i/diggme.png 
    [4] => http://ec.mashable.com/wp-content/uploads/2009/01/bizspark2.gif 
    [5] => http://cdn.mashable.com/wp-content/uploads/2010/09/web.png 
    [6] => http://mashable.com/wp-content/uploads/2010/09/Screen-shot-2010-09-24-at-10.51.26-PM.png 
    [7] => http://cdn.mashable.com/wp-content/uploads/2009/02/bizspark.jpg 
    [8] => http://feedads.g.doubleclick.net/~at/lxx00QTjYBaYojpnpnTa6MXUmh4/0/di 
    [9] => 
    [10] => http://feedads.g.doubleclick.net/~at/lxx00QTjYBaYojpnpnTa6MXUmh4/1/di 
    [11] => 
    [12] => http://feeds.feedburner.com/~ff/Mashable?i=0N_mvMwPHYk:j5Pmi_N-JQ8:D7DqB2pKExk 
    [13] => 
    [14] => http://feeds.feedburner.com/~ff/Mashable?i=0N_mvMwPHYk:j5Pmi_N-JQ8:V_sGLiPBpWU 
    [15] => 
    [16] => http://feeds.feedburner.com/~ff/Mashable?i=0N_mvMwPHYk:j5Pmi_N-JQ8:F7zBnMyn0Lo 
    [17] => 
    [18] => http://feeds.feedburner.com/~ff/Mashable?d=qj6IDK7rITs 
    [19] => 
    [20] => http://feeds.feedburner.com/~ff/Mashable?d=_e0tkf89iUM 
    [21] => 
    [22] => http://feeds.feedburner.com/~ff/Mashable?i=0N_mvMwPHYk:j5Pmi_N-JQ8:gIN9vFwOqvQ 
    [23] => 
    [24] => http://feeds.feedburner.com/~ff/Mashable?d=yIl2AUoC8zA 
    [25] => 
    [26] => http://feeds.feedburner.com/~ff/Mashable?d=P0ZAIrC63Ok 
    [27] => 
    [28] => http://feeds.feedburner.com/~ff/Mashable?d=I9og5sOYxJI 
    [29] => 
    [30] => http://feeds.feedburner.com/~ff/Mashable?d=CC-BsrAYo0A 
    [31] => 
    [32] => http://feeds.feedburner.com/~ff/Mashable?i=0N_mvMwPHYk:j5Pmi_N-JQ8:_cyp7NeR2Rw 
    [33] => 
    [34] => http://feeds.feedburner.com/~r/Mashable/~4/0N_mvMwPHYk
)
基本上,我想

  1. 删除每个空数组元素
  2. 删除每个数组项 名称中包含".jpg,.png,.gif"个扩展名;
  3. 最后删除包含"digg,fb,tweet,bizspark"
  4. 等关键字的数组项

    尝试过你的代码并返回例如 嗨,香港专业教育学院尝试上面的代码...它返回一个包含我想要的东西的数组。

    嗨,我尝试了上面的代码......它返回一个包含我想要的东西的数组。 )

    Array ( [5] =>
    http://feedads.g.doubleclick.net/~at/W-z_kHMi30EtE1mpxK8NvMmNmeg/0/di
    [7] =>
    http://feedads.g.doubleclick.net/~at/W-z_kHMi30EtE1mpxK8NvMmNmeg/1/di
    [9] =>
    http://feeds.feedburner.com/~ff/Mashable?i=mEedXAp78pg:339cIishd6A:D7DqB2pKExk
    [11] =>
    http://feeds.feedburner.com/~ff/Mashable?i=mEedXAp78pg:339cIishd6A:V_sGLiPBpWU
    [13] =>
    http://feeds.feedburner.com/~ff/Mashable?i=mEedXAp78pg:339cIishd6A:F7zBnMyn0Lo
    [15] =>
    http://feeds.feedburner.com/~ff/Mashable?d=qj6IDK7rITs
    [17] =>
    http://feeds.feedburner.com/~ff/Mashable?d=_e0tkf89iUM
    [19] =>
    http://feeds.feedburner.com/~ff/Mashable?i=mEedXAp78pg:339cIishd6A:gIN9vFwOqvQ
    [21] =>
    http://feeds.feedburner.com/~ff/Mashable?d=yIl2AUoC8zA
    [23] =>
    http://feeds.feedburner.com/~ff/Mashable?d=P0ZAIrC63Ok
    [25] =>
    http://feeds.feedburner.com/~ff/Mashable?d=I9og5sOYxJI
    [27] =>
    http://feeds.feedburner.com/~ff/Mashable?d=CC-BsrAYo0A
    [29] =>
    http://feeds.feedburner.com/~ff/Mashable?i=mEedXAp78pg:339cIishd6A:_cyp7NeR2Rw
    [31] =>
    http://feeds.feedburner.com/~r/Mashable/~4/mEedXAp78pg
    ))
    

    我希望它能从例如第一个例子中返回

    [5] => http://cdn.mashable.com/wp-content/uploads/2010/09/web.png 
        [6] => http://mashable.com/wp-content/uploads/2010/09/Screen-shot-2010-09-24-at-10.51.26-PM.png 
    

    任何想法?


    嗨GZIp我已经修改了代码并且我获得了更好的结果

    function url_array_filter($url)
    {
        static $words = array('digg', 'fb', 'tweet', 'bizspark','feedburner','feedads','CountImage');
        static $extens = array('.jpg', '.png', '.gif');
        $ret = true;
        if (!$url) {
            $ret = false;
        } elseif (str_replace($words, '', $url) != $url) {
            $ret = false;
        } else {
            $path = parse_url($url, PHP_URL_PATH);
            if (in_array(substr($path, -4), $extens)) {
                $ret = false;
            }
        }
        return $ret;
    } 
    

    我的问题现在出现了输出。例如

    Array ( [0] => http://cdn.dzone.com/images/thumbs/120x90/491551.jpg' style='width:120;height:90;float:left;vertical-align:top;border:1px solid ) 
    
    Array ( [0] => http://cdn.dzone.com/images/thumbs/120x90/490913.jpg' style='width:120;height:90;float:left;vertical-align:top;border:1px solid ) 
    

    我只想要网址。我认为我有从原始内容中提取网址的问题。 lemme发布了一个关于原始问题和我正在做什么的链接。

    RSS Feeds and image extraction indepth

    我只是想要网址。我想从那个链接.... getImagesUrl()可能搞砸了。我将尝试使用parse_url来恢复正确的URL。 lemme知道我是否在正确的轨道上。我非常接近管理从用magpie解析的RSS源提取图像网址


    Ok GZip,这是修改和添加到你的代码... 95%的作品!大。 虽然我确实收到了一些有趣的结果,我发布在

    下面
    function url_array_filter($url)
    {
        static $words = array('digg', 'fb', 'tweet', 'bizspark','feedburner','feedads','CountImage','fuelbrand');
        static $extens = array('.jpg', '.png', '.gif');
        $ret = true;
        if (!$url) {
            $ret = false;
        } elseif (str_replace($words, '', $url) != $url) {
            $ret = false;
        } else {
            $path = parse_url($url, PHP_URL_PATH);
            if (in_array(substr($path, -4), $extens)) {
                $ret = false;
            }
        }
        return $ret;
    } 
    
    function cleanURL($a_url)
        {
        $ret=array();
        foreach ($a_url as $c)
            {
            $a=parse_url($c, PHP_URL_SCHEME).'://'.parse_url($c, PHP_URL_HOST).parse_url($c, PHP_URL_PATH);    
            $a=explode("'",$a);
            $ret[]=$a[0];
            }
        return $ret;         
        }
    

    示例用法。 $这 - > getImagesUrl($ C);下面在第一个问题中返回结果。

                        foreach($content as $c) {
                            // get the images in content
                            $arr = $this->getImagesUrl($c);
                            $arr = array_filter($arr, 'url_array_filter');
                            }
                        $ret=cleanURL($arr);
                        if (count($ret)>0)
                            {
                            print_r($ret);                                
                            echo "<br/><br/>";
                            }
    
    到目前为止,几乎所有事情都很有效但我不断得到一些不好的结果,比如

    Array ( [0] => http://cdn.mashable.com/wp-content/uploads/2010/02/ipad-side- )
    Array ( [0] => http://mrg.bz/FZtr2k [1] => http://mrg.bz/IDkx4w ) 
    

    我们差不多的人......任何想法

2 个答案:

答案 0 :(得分:6)

使用例如array_filter()将为您提供灵活性和易维护性(更改要求,调试等):

function url_array_filter($url)
{
    static $words = array('digg', 'fb', 'tweet', 'bizspark');
    static $extens = array('.jpg', '.png', '.gif');
    $ret = true;
    if (!$url) {
        $ret = false;
    } elseif (str_replace($words, '', $url) != $url) {
        $ret = false;
    } else {
        $path = parse_url($url, PHP_URL_PATH);
        if (in_array(substr($path, -4), $extens)) {
            $ret = false;
        }
    }
    return $ret;
}

$arr = array_filter($arr, 'url_array_filter');
print_r($arr);

(适用于给定的数组,但可能需要更改;它是演示代码。)

答案 1 :(得分:3)

foreach ($array as $key => $value) {
    if (
        empty($value)||
        (preg_match('#^http:\/\/(.*)\.(gif|png|jpg)$#i', $value) == 0)||
        (preg_match('#(tweet|bizspark)#i', $value) > 0)
    ) {
        unset($array[$key]);
    }
}