Question

我有一个输入文件，其语法如下：

00000 INFO  [IVS ] reset receiver  
00000 INFO  [IVS ] reset transmitter  
00331 INFO  [IVS ] sync detected

表格中所需的数据

frame=0000  
info=INFO  
TYPE=[IVS ]  
message=reset receiver  

($frame,$info,$type,$message)=split(what would be the argument?);

注意：括号前的IVS之后的空格，所以不能使用空格作为分隔符。

Answer 1

错误的问题。你不想使用拆分。经验法则是：当您知道数据的样子时，使用正则表达式匹配;当你知道你的分隔符是什么样的时候使用split。

my ($frame, $info, $type, $message) = 
    $data =~ /(\d+) (\S+)\s+\[(\S+)\s*\] (.*)/;

将是一个非常好的开始。

Answer 2

我喜欢正则表达式，但是...... TIMTOWTDI也是如此。）

while (<DATA>) {
  printf "frame=%s\ninfo=%s\nTYPE=%s\nmessage=%s\n", 
    unpack("A6 A6 A7 A*", $_);
}

__DATA__
00000 INFO  [IVS ] reset receiver
00000 INFO  [IVS ] reset transmitter
00331 INFO  [IVS ] sync detected

说真的，重点是，用一个简单的unpack分割数据字符串可能会更好（是的，解包很简单，只需要练习......））而不是一些扭曲的正则表达式 - 当然，如果所有数据列都有固定的宽度。但有时情况就是这样。）

Answer 3

您希望在空格上进行拆分，只要该空格后面没有]。这意味着您希望在正则表达式中使用负向前瞻。不要忘记split()可以将正则表达式作为其第一个参数。它也可以获取它返回的字段数，所以如果你这样做：

my ($frame, $info, $type, $message) = split(/\s+(?!])/, $line, 4);

...然后你会得到你想要的东西。

此split()拆分一个或多个空格字符，后跟]。它还会返回四个字段，因此您不会拆分$message字段（第三次拆分后的所有字段都会以$message结尾）。

Answer 4

我同意@hobbs但你应该使用扩展格式来表示复杂的正则表达式：

while( my $line = <DATA> ){
  chomp $line;

  my ( $frame, $info, $type, $message ) = 
    $line =~ m{
      \A        # start at the beginning of the string
      (\d+)     # capture a string of digits        --> $frame
      \s+       # skip the white space
      (\S+)     # capture a string of non-spaces    --> $info
      \s+       # skip the white space
      (         # start a capture                   --> $type
        \[      #   capture an opening bracket
        [^\]]*  #   capture everything that's not a closing bracket
        \]      #   capture the closing bracket
      )         # end the capture
      \s+       # skip the white space
      (.*)      # capture the remainder of the line --> $message
    }msx;

  print "\$frame   = $frame\n";
  print "\$info    = $info\n";
  print "\$type    = $type\n";
  print "\$message = $message\n";
  print "\n";
}

__DATA__
00000 INFO  [IVS ] reset receiver
00000 INFO  [IVS ] reset transmitter
00331 INFO  [IVS ] sync detected

perl分割功能用法

4 个答案: