使用正则表达式进行过滤并返回匹配的数字

时间:2014-05-25 06:34:19

标签: php regex

在这里,我尝试使用正则表达式从文本中过滤特定的电话号码。电话名称可能有这样的漏洞。

4023one345233应被视为40231345233,然后应进行过滤。

此代码在没有漏洞的情况下正常工作:

代码1:

$arrwords = array(0=>'zero',1=>'one',2=>'two',3=>'three',4=>'four',5=>'five',6=>'six',7=>'seven',8=>'eight',9=>'nine');
preg_match_all('/[A-za-z]+/', $text, $matches);
$arr=$matches[0];
foreach($arr as $v)
{
    $v = strtolower($v);
    if(in_array($v,$arrwords))
    {
        $text= str_replace($v,array_search($v,$arrwords),$text);
    }
}
foreach ($words as $word){

    $pattern = '/^(?=.{8,14})b$\(?(?:(?:0(?:0|11)\)?[\s-]?\(?|\+)44\)?[\s-]?\(?(?:0\)?[\s-]?\(?)?|0)(?:\d{2}\)?[\s-]?\d{4}[\s-]?\d{4}|\d{3}\)?[\s-]?\d{3}[\s-]?\d{3,4}|\d{4}\)?[\s-]?(?:\d{5}|\d{3}[\s-]?\d{3})|\d{5}\)?[\s-]?\d{4,5}|8(?:00[\s-]?11[\s-]?11|45[\s-]?46[\s-]?4\d))(?:(?:[\s-]?(?:x|ext\.?\s?|\#)\d+)?)$^|^2(?:0[01378]|3[0189]|4[017]|8[0-46-9]|9[012])\d{7}|1(?:(?:1(?:3[0-48]|[46][0-4]|5[012789]|7[0-49]|8[01349])|21[0-7]|31[0-8]|[459]1\d|61[0-46-9]))\d{6}|1(?:2(?:0[024-9]|2[3-9]|3[3-79]|4[1-689]|[58][02-9]|6[0-4789]|7[013-9]|9\d)|3(?:0\d|[25][02-9]|3[02-579]|[468][0-46-9]|7[1235679]|9[24578])|4(?:0[03-9]|2[02-5789]|[37]\d|4[02-69]|5[0-8]|[69][0-79]|8[0-5789])|5(?:0[1235-9]|2[024-9]|3[0145689]|4[02-9]|5[03-9]|6\d|7[0-35-9]|8[0-468]|9[0-5789])|6(?:0[034689]|2[0-689]|[38][013-9]|4[1-467]|5[0-69]|6[13-9]|7[0-8]|9[0124578])|7(?:0[0246-9]|2\d|3[023678]|4[03-9]|5[0-46-9]|6[013-9]|7[0-35-9]|8[024-9]|9[02-9])|8(?:0[35-9]|2[1-5789]|3[02-578]|4[0-578]|5[124-9]|6[2-69]|7\d|8[02-9]|9[02569])|9(?:0[02-589]|2[02-689]|3[1-5789]|4[2-9]|5[0-579]|6[234789]|7[0124578]|8\d|9[2-57]))\d{6}|1(?:2(?:0(?:46[1-4]|87[2-9])|545[1-79]|76(?:2\d|3[1-8]|6[1-6])|9(?:7(?:2[0-4]|3[2-5])|8(?:2[2-8]|7[0-4789]|8[345])))|3(?:638[2-5]|647[23]|8(?:47[04-9]|64[015789]))|4(?:044[1-7]|20(?:2[23]|8\d)|6(?:0(?:30|5[2-57]|6[1-8]|7[2-8])|140)|8(?:052|87[123]))|5(?:24(?:3[2-79]|6\d)|276\d|6(?:26[06-9]|686))|6(?:06(?:4\d|7[4-79])|295[567]|35[34]\d|47(?:24|61)|59(?:5[08]|6[67]|74)|955[0-4])|7(?:26(?:6[13-9]|7[0-7])|442\d|50(?:2[0-3]|[3-68]2|76))|8(?:27[56]\d|37(?:5[2-5]|8[239])|84(?:3[2-58]))|9(?:0(?:0(?:6[1-8]|85)|52\d)|3583|4(?:66[1-8]|9(?:2[01]|81))|63(?:23|3[1-4])|9561))\d{3}|176888[234678]\d{2}|16977[23]\d{3}|7(?:[1-4]\d\d|5(?:0[0-8]|[13-9]\d|2[0-35-9])|624|7(?:0[1-9]|[1-7]\d|8[02-9]|9[0-689])|8(?:[014-9]\d|[23][0-8])|9(?:[04-9]\d|1[02-9]|2[0-35-9]|3[0-689]))\d{6}|76(?:0[012]|2[356]|4[0134]|5[49]|6[0-369]|77|81|9[39])\d{6}|80(?:0\d{6,7}|8\d{7})|500\d{6}|(?:87[123]|9(?:[01]\d|8[0-3]))\d{7}|8(?:4[2-5]|70)\d{7}|70\d{8}|56\d{8}|(?:3[0347]|55)\d{8}|8(?:001111|45464\d)$|(?:\((\+?\d+)?\)|(\+\d{0,3}))? ?\d{2,3}([-\.]?\d{2,3} ?){3,4}/';
    preg_match_all($pattern, $text, $matches, PREG_OFFSET_CAPTURE );            
    $this->pushToResultSet($matches);
}

从SO帮助我可以使用上面提到的利用漏洞过滤数字的代码。

http://ideone.com/8UW22U - Link to test

码2:

$arrwords = array_flip(array(0=>'zero',1=>'one',2=>'two',3=>'three',4=>'four',5=>'five',6=>'six',7=>'seven',8=>'eight',9=>'nine'));

$s = "my long STRING with some Numbers 402three1345233 4023one345233";

$sanitised = array();    
foreach (explode(" ", $s) as $word) {
    $num = strtr(strtolower($word), $arrwords);
    $sanitised[] = is_numeric($num) ? str_repeat("*", strlen($word)) : $word;        
}

echo implode(" ", $sanitised);

但是在我的第一段代码中,我只想在找到数字后匹配模式,然后返回matched pattern

这里我尝试在代码1中移植代码2。

foreach (explode(" ", $s) as $word) {
    $num = strtr(strtolower($word), $arrwords);
    if(is_numeric($num)){ 
         $pattern = 'regex_above';
        preg_match_all($pattern, <$text?????>, $matches, PREG_OFFSET_CAPTURE );            
        $this->pushToResultSet($matches);

    }
}

有人可以帮忙解决这个问题吗?

注意:请注意,原始数字和匹配模式的长度应相同。 方法4023three345233应与****************而非***********

匹配

1 个答案:

答案 0 :(得分:0)

如果我正确理解您的问题,您想要用星号替换一串数字(可能包含书写数字)。星号的数量必须等于字符串中的字符数。

在下面的代码中,正则表达式匹配包含3到7个数字的字符串。

$s = "123 onetwothree 1two3 one dog";
$new_words = array();
$numbers = array();
$pattern = "#(\d|zero|one|two|three|four|five|six|seven|eight|nine){3,7}#i";
foreach(explode(" ", $s) as $word) {
    if(preg_match($pattern, $word, $matches)) {
        $new_words[] = str_repeat("*", strlen($word));
        $numbers[] = $matches[0];
    } else {
        $new_words[] = $word;
    }
}

$new_s = implode(" ", $new_words);
print $new_s . "\n";
print implode(" ", $numbers) . "\n";

给出:

*** *********** ***** one dog
123 onetwothree 1two3

您的代码中的正则表达式非常长并且添加了“零|一个| ...”&#39;对正则表达式可能不适合你。另一种解决方案可能是:

  • 获取字符串中每个单词的字符数:$ word_lengths
  • 用其数值替换书面数字。例如&#39;一个&#39;成为&#39; 1&#39;
  • 与您的长正则表达式匹配
  • 如果匹配,则根据$ word_lengths
  • 创建一个星号字符串