我是Perl的新手,我想根据输入文件中的列名创建输出文件的名称。假设我的输入文件头如下:
#identifier (%)composition
我希望我的输出文件名为identifier_composition
。这些identifiers
和compositions
可以是一系列字母数字字符,例如#E2FAR4
用于标识符,或(%)MhDE4
用于组合。对于此示例,输出文件名应为E2FAR4_MhDE4
。到目前为止,我可以获得identifier
但不能获得composition
。这就是我尝试过的代码:
if ($line =~ /^#\s*(\S+)\t\(%)s*(\S+)/){
my $ID = $1;
my $comp = $2;
my $out_file = "${ID}_${comp}"
}
但我也将identifier
作为第二个参数。任何帮助将不胜感激。
答案 0 :(得分:2)
使用以下正则表达式
^#\s*(\S+)\t\(%\)(\S+)
示例代码:
#!/usr/bin/perl
use strict;
use warnings;
while(<DATA>){
my $line = $_;
chomp $line;
if ($line =~ /^#\s*(\S+)\t\(%\)(\S+)/){
my $ID = $1;
my $comp = $2;
my $out_file = "${ID}_${comp}";
print "Filename: $out_file";
}
}
__DATA__
#identifier (%)composition
输出:
Filename: identifier_composition
答案 1 :(得分:1)
看起来你过分思考你的正则表达式了。您正在寻找由一些非单词字符分隔的两个单词字符序列。
if ($line =~ /(\w+)\W+(\w+)/) {
say "$1 / $2";
}
更简单的方法是匹配所有单词字符序列:
if (my @words = $line =~ /(\w+)/g) {
say join ' / ', @words;
}
更新:我将你的正则表达式放入regex explainer。这是出来的结果:
NODE EXPLANATION
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
# '#'
--------------------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
\S+ non-whitespace (all but \n, \r, \t, \f,
and " ") (1 or more times (matching the
most amount possible))
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
\t '\t' (tab)
--------------------------------------------------------------------------------
\^ '^'
--------------------------------------------------------------------------------
( group and capture to \2:
--------------------------------------------------------------------------------
% '%'
--------------------------------------------------------------------------------
) end of \2
--------------------------------------------------------------------------------
s* 's' (0 or more times (matching the most
amount possible))
--------------------------------------------------------------------------------
( group and capture to \3:
--------------------------------------------------------------------------------
\S+ non-whitespace (all but \n, \r, \t, \f,
and " ") (1 or more times (matching the
most amount possible))
--------------------------------------------------------------------------------
) end of \3
我认为你最大的问题是你在正则表达式中间尝试匹配的文字^
,但%
周围未转义的括号也是一个问题。 s*
毫无意义且令人困惑: - )