我正在尝试编写正则表达式,但我无法传递单词space
我有一个这样的数据文件(由另一个实用程序生成)
* field : 100
blahbla : <Set>
scree : <what>
.Cont.asasd :
Othreaol : Value, Other value
Point->IP : 0.0.0.0 Port 5060
模式必须匹配并捕获这样的数据
"field" "100"
"blahbla" "<Set>"
"scree" "<what>"
".Cont.asasd" ""
"Othreaol" "Value, Other value"
我早期的解决方案是
/^([\s\*]+)([\w]+[\s\.\-\>]{0,2}[\w]+)(\s*\:\s)(.*)/
但是我遇到了像
这样的字符串的问题Z.15 example : No
空格使模式不匹配
H.25 miss here : No
同样的事情
答案 0 :(得分:5)
这里有一些复杂的答案。我想我会用一个简单的split:
while( <DATA> ) {
chomp;
my( $field, $value ) = split /\s*:\s*/, $_, 2;
print "Field [$field] value [$value]\n";
}
__DATA__
* field : 100
blahbla : <Set>
scree : <what>
.Cont.asasd :
Othreaol : Value, Other value
Point->IP : 0.0.0.0 Port 5060
这给出了:
Field [* field] value [100]
Field [blahbla] value [<Set>]
Field [scree] value [<what>]
Field [.Cont.asasd] value []
Field [Othreaol] value [Value, Other value]
Field [Point->IP] value [0.0.0.0 Port 5060]
从那里开始,我会根据需要过滤名称和值,而不是试图在一个正则表达式中完成所有操作:
my @pairs =
grep { $_->[0] !~ /->/ } # filter keys
map { $_->[0] =~ s/\A\*\s+//; $_ } # transform keys
map { chomp; [ split /\s*:\s*/, $_, 2 ] } # parse line
<DATA>;
use Data::Printer;
p @pairs;
__DATA__
* field : 100
blahbla : <Set>
scree : <what>
.Cont.asasd :
Othreaol : Value, Other value
Point->IP : 0.0.0.0 Port 5060
答案 1 :(得分:1)
由于您希望通过冒号分隔值,因此请在分割前使用正则表达式中 字符的补码来表示所有这些字符。
my $regex
= qr{
( # v- no worry, this matches the first non-space, non-colon
[^\s:]
(?> [^:\n]* # this matches all non-colon chars on the line
[^\s:] # match the last non-space, non-colon, if there
)? # but possibly not there
) # end group
\s* # match any number of whitespace
: # match the colon
\s* # followed by any number of whitespace
( \S # Start second capture with any non space
(?> .* # anything on the same line
\S # ending in a non-space
)? # But, possibly not there at all
| # OR
) # nothing - this gives the second capture as an
# empty string instead of an undef
}x;
while ( <$in> ) {
$hash{ $1 } = $2 if m/$regex/;
}
%hash
然后看起来像这样:
{ '* field' => '100'
, '.Cont.asasd' => ''
, 'H.25 miss here' => 'No'
, Othreaol => 'Value, Other value'
, 'Point->IP' => '0.0.0.0 Port 5060'
, 'Z.15 example' => 'No'
, blahbla => '<Set>'
, scree => '<what>'
}
当然,当我开始考虑它时,如果您可以确定/\s+:\s+/
模式或至少/\s{2,}:\s{2,}/
模式,那么仅仅split
可能更简单像这样的行:
while ( <$in> ) {
if ( my ( $k, @v )
= grep {; length } split /\A\s+|\s+\z|(\s+:\s+)/
) {
shift @v; # the first one will be the separator
$hash{ $k } = join( '', @v );
}
}
它做同样的事情,因为不需要做很多回溯来修剪结果。并且它忽略了转义冒号而没有更多的语法,因为它必须是由空格包围的 bare 冒号。您只需将以下内容添加到if块:
$k =~ s/(?<!\\)(\\\\)*\\:/$1:/g;
答案 2 :(得分:0)
我不明白为什么你的示例输出中省略了Point->IP
行,但是下面的代码应该适合你。
use strict;
use warnings;
while (<DATA>) {
next unless /([^\s*].+?)\s*:\s*(.*?)\s*$/;
printf qq("%s" "%s"\n), $1, $2;
}
__DATA__
* field : 100
blahbla : <Set>
scree : <what>
.Cont.asasd :
Othreaol : Value, Other value
Point->IP : 0.0.0.0 Port 5060
Z.15 example : No
H.25 miss here : No
<强>输出强>
"field" "100"
"blahbla" "<Set>"
"scree" "<what>"
".Cont.asasd" ""
"Othreaol" "Value, Other value"
"Point->IP" "0.0.0.0 Port 5060"
"Z.15 example" "No"
"H.25 miss here" "No"