正则表达式不匹配数据和日期

时间:2011-11-07 17:27:09

标签: regex perl

我有一个SQL Select转储,其中许多行看起来像这样:

07/11/2011 16:48:08,07/11/2011 16:48:08,'YD','MANUAL',0,1,'text','text','text','text',,,,'text',,,0,0,

我想对每一行做两件事:

  1. 用Oracle的sysdate函数替换所有日期。日期也可以没有时间(如07/11/2011)。
  2. 将所有空值替换为null字符串
  3. 这是我的尝试:

    $_ =~ s/,(,|\n)/,null$1/g;                  # Replace no data by "null"
    $_ =~ s/\d{2}\/\d{2}\/d{4}.*?,/sysdate,/g;  # Replace dates by "sysdate"
    

    但是这会将字符串转换为:

    07/11/2011 16:48:08,07/11/2011 16:48:08,'YD','MANUAL',0,1,'text','text','text','text',null,,null,'text',null,,0,0,null
    

    虽然我希望它是

    sysdate,sysdate,'YD','MANUAL',0,1,'text','text','text','text',null,null,null,'text',null,null,0,0,null
    

    我不明白为什么日期不匹配以及为什么某些,,不会被null取代。

    任何见解都欢迎,提前谢谢。

4 个答案:

答案 0 :(得分:1)

你可以这样做:

$ cat perlregex.pl
use warnings;
use strict;

my $row = "07/11/2011 16:48:08,07/11/2011 16:48:08,'YD','MANUAL',0,1,'text','text','text','text',,,,'text',,,0,0,\n";

print( "$row\n" );
while ( $row =~ /,([,\n])/ ) { $row =~ s/,([,\n])/,null$1/; }
print( "$row\n" );
$row =~ s/\d{2}\/\d{2}\/\d{4}.*?,/sysdate,/g;
print( "$row\n" );

结果如下:

$ ./perlregex.pl
07/11/2011 16:48:08,07/11/2011 16:48:08,'YD','MANUAL',0,1,'text','text','text','text',,,,'text',,,0,0,

07/11/2011 16:48:08,07/11/2011 16:48:08,'YD','MANUAL',0,1,'text','text','text','text',null,null,null,'text',null,null,0,0,null

sysdate,sysdate,'YD','MANUAL',0,1,'text','text','text','text',null,null,null,'text',null,null,0,0,null

这当然可以进行优化,但它可以解决问题。

答案 1 :(得分:1)

\d{2}\/\d{2}\/d{4}.*?,无效,因为最后d未被转义 如果,可以位于字符串的任一侧或开头/结尾,则可以分两步执行:

第1步
s/(?:^|(?<=,))(?=,|\n)/null/g
展开:

/
  (?:  ^           # Begining of line, ie: nothing behind us
     | (?<=,)      # Or, a comma behind us
  )
     # we are HERE!, this is the place between characters
  (?=  ,           # A comma in front of us
     | \n          # Or, a newline in front of us
  )
/null/g
# The above regex does not consume, it just inserts 'null', leaving the
# same search position (after the insertion, but before the comma).

# If you want to consume a comma, it would be done this way:
s/(?:^|(?<=,))(,|\n)/null$1/xg
# Now the search position is after the 'null,'

第2步
s/(?:^|(?<=,))\d{2}\/\d{2}\/\d{4}.*?(?=,|\n)/sysdate/g

或者,您可以使用eval修饰符将它们组合成单个正则表达式:
$row =~ s/(?:^|(?<=,))(\d{2}\/\d{2}\/\d{4}.*?|)(?=,|\n)/ length $1 ? 'sysdate' : 'null'/eg;

分解它看起来像这样

s{
   (?: ^ | (?<=,) )  # begin of line or comma behind us
   (                 # Capt group $1
       \d{2}/\d{2}/\d{4}.*?     # date format and optional non-newline chars
     |                          # Or, nothing at all
   )                 # End Capt group 1
  (?= , | \n )       # comma or newline in front of us
}{
   length $1 ? 'sysdate' : 'null'
}eg  

如果有可能出现非换行空白填充,则可以写成:

$row =~ s/(?:^|(?<=,))(?:([^\S\n]*\d{2}\/\d{2}\/\d{4}.*?)|[^\S\n]*)(?=,|\n)/ defined $1 ? 'sysdate' : 'null'/eg;

答案 2 :(得分:1)

你想要替换一些东西。通常前瞻是一个更好的选择:

$subject =~ s/(?<=,)(?=,|$)/null/g;

说明:

"
(?<=       # Assert that the regex below can be matched, with the match ending at this position (positive lookbehind)
   ,          # Match the character “,” literally
)
(?=        # Assert that the regex below can be matched, starting at this position (positive lookahead)
              # Match either the regular expression below (attempting the next alternative only if this one fails)
      ,          # Match the character “,” literally
   |          # Or match regular expression number 2 below (the entire group fails if this one fails to match)
      \$          # Assert position at the end of the string (or before the line break at the end of the string, if any)
)
"

您希望更换日期:

$subject =~ s!\d{2}/\d{2}/\d{4}.*?(?=,)!sysdate!g;

这与原始正则表达式几乎相同。用前瞻替换最后一个。 (如果您不想替换它,请不要匹配它。)

# \d{2}/\d{2}/\d{4}.*?(?=,)
# 
# Match a single digit 0..9 «\d{2}»
#    Exactly 2 times «{2}»
# Match the character “/” literally «/»
# Match a single digit 0..9 «\d{2}»
#    Exactly 2 times «{2}»
# Match the character “/” literally «/»
# Match a single digit 0..9 «\d{4}»
#    Exactly 4 times «{4}»
# Match any single character that is not a line break character «.*?»
#    Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
# Assert that the regex below can be matched, starting at this position (positive lookahead) «(?=,)»
#    Match the character “,” literally «,»

答案 3 :(得分:0)

也许。*?太贪心了,试试:

$_ =~ s/\d{2}\/\d{2}\/d{4}[^,]+,/sysdate,/g;