Question

我不得不在SQL转储中替换fqdn以进行网站迁移。我编写了一个perl过滤器，该过滤器应该采用STDIN，替换包含应该被替换的域名的序列化字符串，将其替换为传递给脚本的任何参数，然后输出到STDOUT。

这是我到目前为止所做的：

my $search   = $ARGV[0];
my $replace  = $ARGV[1];
my $offset_s = length($search);
my $offset_r = length($replace);
my $regex    = eval { "s\:([0-9]+)\:\\\"(https?\://.*)($search.*)\\\"" };

while (<STDIN>) {
    my @fs = split(';', $_);
    foreach (@fs) {
        chomp;
        if (m#$regex#g) {
        my ( $len, $extra, $str ) = ( $1, $2, $3 );
        my $new_len = $len - $offset_s + $offset_r;
        $str =~ eval { s/$search/$replace/ };
        print 's:' . $new_len . ':' . $extra . $str . '\"'."\n";
        }
    }
}

过滤器传递的数据看起来像这样（这是从wordpress转储中获取的，但我们也应该适应drupal转储：

INSERT INTO `wp_2_options` VALUES (1,'siteurl','http://to.be.replaced.com/wordpress/','yes'),(125,'dashboard_widget_options','
a:2:{
s:25:\"dashboard_recent_comments\";a:1:{
s:5:\"items\";i:5;
}
s:24:\"dashboard_incoming_links\";a:2:{
s:4:\"home\";s:31:\"http://to.be.replaced.com/wordpress\";
s:4:\"link\";s:107:\"http://blogsearch.google.com/blogsearch?scoring=d&partner=wordpress&q=link:http://to.be.replaced.com/wordpress/\";
}
}
','yes'),(148,'theme_175','
a:1:{
s:13:\"courses_image\";s:37:\"http://to.be.replaced.com/files/image.png\";
}
','yes')

如果我的$search中没有任何句号，则正则表达式有效。我试图逃避期间，即domain\.to\.be\.replaced，但这不起作用。我可能是以非常迂回的方式做到这一点，或者遗漏了一些明显的东西。任何帮助将不胜感激。

Answer 1

由于在其中包含变量，因此无需评估（eval）正则表达式。另外，为了避免像$search这些变量的元字符的特殊含义，使用quotemeta()函数转义它们，或者在正则表达式中包含\Q和\E之间的变量。所以而不是：

my $regex = eval { "s\:([0-9]+)\:\\\"(https?\://.*)($search.*)\\\"" };

使用：

my $regex = qr{s\:([0-9]+)\:\\\"(https?\://.*)(\Q$search\E.*)\\\"};

或

my $quoted_search = quotemeta $search;
my $regex = qr{s\:([0-9]+)\:\\\"(https?\://.*)($quoted_search.*)\\\"};

对这一行提出同样的建议：

$str =~ eval { s/$search/$replace/ };

Answer 2

您必须将\变量中的转义字符$search加倍，以使插值字符串包含转义句点。

即。 domain\.to\.be\.replaced - ＆gt; domain.to.be.replaced（不想要）

while domain\\.to\\.be\\.replaced - ＆gt; domain\.to\.be\.replaced（正确）。

Answer 3

我不确定你的perl正则表达式会在字符串匹配中用旧DNS替换DNS（在同一个序列化字符串中）。

我用一个脚本使用bash，sed和一个大的perl正则表达式来解决同样的问题。你可以give it a try。

我使用的正则表达式是这样的（为了可见性而爆炸，并且-7作为域名长度之间的已知差异）：

perl -n -p -i -e '1 while s#
  ([;|{]s:)
  ([0-9]+)
  :\\"
  (((?!\\";).)*?)
  (domain\.to\.be\.replaced)
  (.*?)
  \\";#"$1".($2-7).":\\\"$3new.domain.tld$6\\\";"#ge;' file

这可能不是最好的，但至少它似乎取决于工作。 g选项管理包含要清理的多个序列化字符串的行，while循环重做整个作业，直到在服务化的字符串中没有替换（对于包含多个DNS出现的字符串）。我没有足够的正则表达式尝试递归。

perl从sql dump中替换序列化字符串

3 个答案: