在我的一个PHP网站上,我使用this regular expression自动从字符串中删除电话号码:
$text = preg_replace('/\+?[0-9][0-9()-\s+]{4,20}[0-9]/', '[removed]', $text);
但是,当用户发布包含多个数字作为其文字一部分的长网址时,该网址也会受到preg_replace
的影响,这会破坏网址。
如何确保上述preg_replace
不会更改$text
中包含的网址?
编辑:
根据要求,以下是上述preg_replace
打破的网址示例:
$text = 'Please help me with my question here: https://stackoverflow.com/questions/20589314/ Thanks!';
$text = preg_replace('/\+?[0-9][0-9()-\s+]{4,20}[0-9]/', '[removed]', $text);
echo $text;
//echoes: Please help me with my question here: https://stackoverflow.com/questions/[removed]/ Thanks!
答案 0 :(得分:2)
我认为您必须解析网址和电话号码,例如/(?: url \K | phone number)/
- sln
@sln:我怎么办?如果有帮助,这里有一个URL正则表达式:stackoverflow.com/a/8234912/869849
- ProgrammerGirl
以下是使用提供的regex for url和phone num:
的示例Php测试用例
$text = 'Please help me with my +44-83848-1234 question here: http://stackoverflow.com/+44-83848-1234questions/20589314/ phone #:+44-83848-1234-Thanks!';
$str = preg_replace_callback('~((?:(?:[a-zA-Z]{3,9}:(?://)?)(?:[;:&=+$,\w-]+@)?[a-zA-Z0-9.-]+|(?:www\.|[;:&=+$,\w-]+@)[a-zA-Z0-9.-]+)(?:(?:/[+\~%/.\w-]*)?\??[+=&;%@.\w-]*\#?\w*)?)|(\+?[0-9][0-9()\s+-]{4,20}[0-9])~',
function( $matches ){
if ( $matches[1] != "" ) {
return $matches[1];
}
return '[removed]';
},
$text);
print $str;
输出>>
Please help me with my [removed] question here: http://stackoverflow.com/+44-83848-1234questions/20589314/ phone #:[removed]-Thanks!
正则表达式,使用RegexFormat
# '~((?:(?:[a-zA-Z]{3,9}:(?://)?)(?:[;:&=+$,\w-]+@)?[a-zA-Z0-9.-]+|(?:www\.|[;:&=+$,\w-]+@)[a-zA-Z0-9.-]+)(?:(?:/[+\~%/.\w-]*)?\??[+=&;%@.\w-]*\#?\w*)?)|(\+?[0-9][0-9()\s+-]{4,20}[0-9])~'
( # (1 start), URL
(?:
(?:
[a-zA-Z]{3,9} :
(?: // )?
)
(?: [;:&=+$,\w-]+ @ )?
[a-zA-Z0-9.-]+
|
(?: www \. | [;:&=+$,\w-]+ @ )
[a-zA-Z0-9.-]+
)
(?:
(?: / [+~%/.\w-]* )?
\??
[+=&;%@.\w-]*
\#?
\w*
)?
) # (1 end)
|
( # (2 start), Phone Num
\+?
[0-9]
[0-9()\s+-]{4,20}
[0-9]
) # (2 end)
答案 1 :(得分:1)
你应该多做一些编码,而不是抚摸你的头,你会去抚摸你的自我!
<?php
$text = "This is my number20558789yes with no spaces
and this is yours 254785961
But this 20558474 is within http://stackoverflow.com/questions/20558474/
So I don't remove it
and this is another url http://stackoverflow.com/questions/20589314/
Thanks!";
$up = "(https?://[-.a-zA-Z0-9]+\.[a-zA-Z]{2,3}/\S*)"; // to catch urls
$np = "(\+?[0-9][0-9()-\s+]{4,20}[0-9])"; // you know this pattern already
preg_match_all("#{$up}|{$np}#", $text, $matches); // match all above patterns together ($matches[1] contains urls, $matches[2] contains numbers)
preg_match_all("#{$np}#", print_r(array_filter($matches[1]), true), $urls_numbers); // extract numbers from urls, actually if we have any
$diff = array_diff(array_filter($matches[2]), $urls_numbers[0]); // an array with numbers that we should replace
$text = str_replace($diff, "[removed]", $text); // replacing
echo $text; // here you are
然后输出:
This is my number[removed]yes with no spaces
and this is yours [removed]
But this 20558474 is within http://stackoverflow.com/questions/20558474/
So I don't remove it
and this is another url http://stackoverflow.com/questions/20589314/
Thanks!
答案 2 :(得分:0)
假设电话号码通常在空格之前或者在一行的开头是否公平?如果是这样,这将阻止您意外更改URL,因为URL中间不存在空格或新行:
$text = preg_replace('/(^|\s)\+?[0-9][0-9()-\s+]{4,20}[0-9]/', '[removed]', $text);