Question

我正在尝试读取模板配置文件（template.config），如下所示：

;;path to speedseq package binary directory
$;SPEEDSEQ_BIN_DIR$; = /usr/local/packages/
;;Sequence file 1
$;Seq1File$; =
;;Sequence file 2
$;Seq2File$; =
;;Read Group
$;Read_Group$;='@RG\tID:NA12878\tSM:NA12878\tPL:ILLUMINA\tLB:NA12878\tPU:NA12878'
;;Reference
$;Reference$; =

;;Output Chromosome
$;Chromosome$; = 
;;use --v for verbose summary
$;OTHER_ARGS$; = --v

其字段将由命令行上的用户输入填充，例如：

perl script.pl template.config USER_INPUT.txt USER_INPT2.txt USER_INPUT_REF.txt USER_INPUT_CHR.txt

它将输出一个新的填充配置文件，如下所示：

;;path to speedseq package binary directory
$;SPEEDSEQ_BIN_DIR$; = /usr/local/packages/
;;Sequence file 1
$;Seq1File$; = "USER_INPUT.txt"
;;Sequence file 2
$;Seq2File$; = "USER_INPT2.txt"
;;Read Group
$;Read_Group$;='@RG\tID:NA12878\tSM:NA12878\tPL:ILLUMINA\tLB:NA12878\tPU:NA12878'
;;Reference
$;Reference$; = "USER_INPUT_REF.txt"

 ;;Output Chromosome
 $;Chromosome$; = "USER_INPUT_CHR.txt"
 ;;use --v for verbose summary
 $;OTHER_ARGS$; = --v
I am not sure how to read

在模板中

并标记我需要填充数组的字段。我怎么能这样做？

到目前为止，我只是在我的脚本中阅读配置文件：

open($fpCFG, "$ARGV[0]") or die "Error! Cannot open $sConfigFile for reading: $!";

    $sComponent = $sParam = $sValue = $sDesc = "";
    while (<$fpCFG>) {
        $_ =~ s/\s+$//;
        next if ($_ =~ /^#/);
        next if ($_ =~ /^$/);

        if ($_ =~ m/^\[(\S+)\]$/) {
            $sComponent = $1;
            next;
        }
        elsif ($_ =~ m/^;;\s*(.*)/) {
            $sDesc .= "$1.";
            next;
        }
        elsif ($_ =~ m/\$;(\S+)\$;\s*=\s*(.*)/) {
            $sParam = $1;
            $sValue = $2;

            if ((defined $sValue) && ($sValue !~ m/^\s*$/)) {
                $phConfig->{$sComponent}{$sParam} = ["$sValue", "$sDesc"];
            }

            $sParam = $sValue = $sDesc = "";
            next;
        }
    }

    close($fpCFG);

Answer 1

此答案更直接地响应相关的question。这被标记为这个的“重复”，而现有的答案不能处理这个问题，因此这个答案。有关要求和输入文件格式，请参阅链接。简而言之：有多个部分（如此处的部分），需要保留部分内的行顺序，以及部分的顺序。

这有几个组成部分。

必须以某种方式组织用户输入。我们可以提供一组命名选项，并将选项名称转换为输入文件中的相应行。这是使用特定的Getopt option以及翻译哈希完成的。（它可以用另一种方式组织。）
一对行标识应用用户输入的位置，并且需要维护它们的顺序。为此，我们可以对行使用array-ref，作为散列的section-name键的值。数组保持顺序。哈希不是必需的（它可以都在一个大的数组中），但为未来提供了灵活性。散列中的节的顺序保持在单独的数组中。（或者可以将其添加到哈希值。）

必须在文件中明确列出所有支持的选项。这里我们定义两个。

use warnings;
use strict;
use Getopt::Long;
use feature qw(say);

# Translate user input <--> description line (;;) in file
my ($o1, $o2) = qw(o1 o2);
my %desc = (
    $o1 => 'Sequence file 1', 
    $o2 => 'Output Chromosome',
    # ...
);
my %input;
GetOptions(\%input, "$o1=s", "$o2=s");

my $config_file = 'config.txt';
open my $in_fh, '<', $config_file;

my (%conf, @mod_order, $mod_name, $des, $enter_input);
while (my $line = <$in_fh>)
{
    chomp($line);
    next if $line =~ m/^\s*$/;
    # Name of new section ([]), for hash and order-array
    if ($line =~ m/^\[(.*)\]$/) {
        push @mod_order, $mod_name = $1;
    }

    # A description (;;) line
    if ( ($des) = $line =~ m/^;;(.*)/ ) {
        # Check for input and remember it for next iteration
        for (keys %desc) {
            if (exists $input{$_} and $des =~ /^$desc{$_}/) {
                $enter_input = process_input($desc{$_}, $input{$_});
                last;
            }
        }
        # Keep the description line, it need be printed too
        push @{$conf{$mod_name}}, $line . "\n";
        next;
    }

    if ($enter_input) {
        # Overwrite what is there or append
        $line =~ s/(.*?=)(.*)/$1 $enter_input/;
        $enter_input = '';
    }

    push @{$conf{$mod_name}}, $line . "\n";
}
close $in_fh;

say @{$conf{$_}} for @mod_order;

# In case user's raw input need be processed further
sub process_input {
    my ($desc, $raw_input) = @_;
    # Example (comment): prepend path for `Chromosome` input
    # if ($desc =~ /Ouput Chromosome/) {
    #     return '/data/usr/' . $raw_input;
    # else {
    return $raw_input;
    # }
}

该程序最多可以使用两个提供的选项调用，或者任意一个，或者不调用。

script.pl -o1 INPUT_FOR_FILE_SEQ_1 -o2 INPUT_FOR_CHROMO

Getopt::Long Curtesy检测到所有形式错误的输入。输入文件是硬编码的，但它可以是另一个命令行选项，也可以通过< config.txt读取。

读入模板文件，使用用户输入填充并生成新文件

1 个答案: