我无法找到解决此问题的最佳方案。我们的想法是将包含具有preg_replace_callback()
的特定域的文本的所有URL更改为base 64编码。网址属于此类型:http://www.domain.com/?fsdf76sf8sf6fds
,另一种方式:http://www.otherdomain.com/file/CA60D10F8ACF7CAA
正则表达式的任何想法?
答案 0 :(得分:1)
您正在寻找的是
$s = preg_replace_callback('#(([a-z]+://)|([a-z]+://)?[a-z0-9.-]+\.|\b)domain.com[^\s]+#i', function($match) {
return base64_encode($match[0]);
}, $string);
这个正则表达式可能有点令人困惑,所以让我们分解它:
( -- domain.com must be preceeded by either
([a-z]+://) -- a protocol such as http://
|
([a-z]+://)?[a-z0-9.-]+\. -- possibly a protocol and definitely a subdomain
|
\b -- word-break (prevents otherdomain.com from matching!)
)
domain.com -- the actual domain you're looking for
[^\s]+ -- everything up to the next space (to include path, query string, fragment)
一个非常简单的系统来测试这样的东西:
<?php
$strings = array(
// positives
'a http://www.domain.com/?fsdf76sf8sf6fds z' => 'a xxx z',
'a www.domain.com/?fsdf76sf8sf6fds z' => 'a xxx z',
'a http://domain.com/?fsdf76sf8sf6fds z' => 'a xxx z',
'a domain.com/?fsdf76sf8sf6fds z' => 'a xxx z',
// negatives
'a http://www.otherdomain.com/file/CA60D10F8ACF7CAA z' => null,
'a www.otherdomain.com/file/CA60D10F8ACF7CAA z' => null,
'a http://otherdomain.com/file/CA60D10F8ACF7CAA z' => null,
'a otherdomain.com/file/CA60D10F8ACF7CAA z' => null,
);
foreach ($strings as $string => $result) {
$s = preg_replace_callback('#(([a-z]+://)|([a-z]+://)?[a-z0-9.-]+\.|\b)domain.com[^\s]+#i', function($match) {
return 'xxx';
}, $string);
if (!$result) {
$result = $string;
}
if ($s != $result) {
echo "FAILED: '$string' got '$s'\n";
} else {
echo "OK: '$string'\n";
}
}
(如果您已经进行了单元测试,请使用它,显然......)
答案 1 :(得分:1)
此答案仅适用于以“http://www.domain.com/”或“https://www.domain.com/”开头的网址,但更为精简:
$in = 'before http://www.domain.com/?fsdf76sf8sf6fds after';
$domain = 'www.domain.com';
echo preg_replace_callback('/\b(https?:\/\/'.preg_quote($domain).'\/)\?(\w+)/i', function($m) {
return 'http://www.otherdomain.com/file/'.base64_encode($m[2]);
}, $in);
// outputs "before http://www.otherdomain.com/file/ZnNkZjc2c2Y4c2Y2ZmRz after"
还有一个问题需要解决,“CA60D10F8ACF7CAA”的示例输出显示base64编码的输出与PHP的base64_encode()返回的输出不同:
echo base64_encode('fsdf76sf8sf6fds'); // outputs ZnNkZjc2c2Y4c2Y2ZmRz