Question

我是Perl的新手，我想根据输入文件中的列名创建输出文件的名称。假设我的输入文件头如下：

#identifier    (%)composition

我希望我的输出文件名为identifier_composition。这些identifiers和compositions可以是一系列字母数字字符，例如#E2FAR4用于标识符，或(%)MhDE4用于组合。对于此示例，输出文件名应为E2FAR4_MhDE4。到目前为止，我可以获得identifier但不能获得composition。这就是我尝试过的代码：

if ($line =~ /^#\s*(\S+)\t\(%)s*(\S+)/){
    my $ID = $1;
    my $comp = $2;
    my $out_file = "${ID}_${comp}"
}

但我也将identifier作为第二个参数。任何帮助将不胜感激。

Answer 1

使用以下正则表达式

^#\s*(\S+)\t\(%\)(\S+)

Demo

示例代码：

#!/usr/bin/perl
use strict;
use warnings;
while(<DATA>){
    my $line = $_;
    chomp $line;
    if ($line =~ /^#\s*(\S+)\t\(%\)(\S+)/){
        my $ID = $1;
        my $comp = $2;
        my $out_file = "${ID}_${comp}";
        print "Filename: $out_file";
    }
}

__DATA__
#identifier (%)composition

输出：

Filename: identifier_composition

Answer 2

看起来你过分思考你的正则表达式了。您正在寻找由一些非单词字符分隔的两个单词字符序列。

if ($line =~ /(\w+)\W+(\w+)/) {
  say "$1 / $2";
}

更简单的方法是匹配所有单词字符序列：

if (my @words = $line =~ /(\w+)/g) {
  say join ' / ', @words;
}

更新：我将你的正则表达式放入regex explainer。这是出来的结果：

NODE                     EXPLANATION
--------------------------------------------------------------------------------
  ^                        the beginning of the string
--------------------------------------------------------------------------------
  #                        '#'
--------------------------------------------------------------------------------
  \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                           more times (matching the most amount
                           possible))
--------------------------------------------------------------------------------
  (                        group and capture to \1:
--------------------------------------------------------------------------------
    \S+                      non-whitespace (all but \n, \r, \t, \f,
                             and " ") (1 or more times (matching the
                             most amount possible))
--------------------------------------------------------------------------------
  )                        end of \1
--------------------------------------------------------------------------------
  \t                       '\t' (tab)
--------------------------------------------------------------------------------
  \^                       '^'
--------------------------------------------------------------------------------
  (                        group and capture to \2:
--------------------------------------------------------------------------------
    %                        '%'
--------------------------------------------------------------------------------
  )                        end of \2
--------------------------------------------------------------------------------
  s*                       's' (0 or more times (matching the most
                           amount possible))
--------------------------------------------------------------------------------
  (                        group and capture to \3:
--------------------------------------------------------------------------------
    \S+                      non-whitespace (all but \n, \r, \t, \f,
                             and " ") (1 or more times (matching the
                             most amount possible))
--------------------------------------------------------------------------------
  )                        end of \3

我认为你最大的问题是你在正则表达式中间尝试匹配的文字^，但%周围未转义的括号也是一个问题。 s*毫无意义且令人困惑： - ）

perl - 从列名

2 个答案: