Question

所以我有这个文件有超过480000行和1380列。如果第二行中的值为Sex：Female或Sex：Male，我需要有一个将F_或M_添加到第一行中的值的管道。

我文件中的第一行基本上是单个ID，后跟单元格类型-N或-G。第二行表示该个体是女性还是男性，其余行是第一列中的probe_Ids，其他列是每个个体的相应beta_值。如果这更有意义，我会添加以下几行。

我的输入文件是这样的（以制表符分隔），没有第一列。

1740-N  1546-N  1546-G  1740-G  1228-G  5121-N  5121-G
Sex: Female Sex: Female Sex: Female Sex: Female Sex: Male   Sex: Female Sex: Female

我的输出应该看起来像这样（以制表符分隔）而没有第一列

F_1740-N    F_1546-N    F_1546-G    F_1740-G    M_1228-G    F_5121-N    F_5121-G

请注意，不会输出性别行。

有人可以帮忙吗？如果我的列数很少，我会手动完成。

这可以在任何程序中完成;我不是坚持使用perl

Answer 1

$ awk -F'\t' '
NR%2 { split($0,a); next }
{
    for (i=1;i<=NF;i++)
        printf "%s%s_%s", (i==1?"":FS), ($i~/Female/?"F":"M"), a[i]
    print ""
}
' file
F_1740-N        F_1546-N        F_1546-G        F_1740-G        M_1228-G       F_5121-N F_5121-G

Answer 2

保留一行缓冲区。

my $last_line = <>;
if ($last_line) {
   while (my $this_line = <>) {
      if ($this_line =~ /^Sex:/) {
         adjust_for_sex($last_line, $this_line);
         next;  # Don't display the Sex row.
      }

      print($last_line);
      $last_line = $this_line;
   }

   print($last_line);
}

这是执行实际更改的代码：

sub adjust_for_sex {
   my ($last_line, $this_line) = @_;

   chomp($last_line);
   my @last_fields = split /\t/, $last_line;

   chomp($this_line);
   my @this_fields = split /\t/, $this_line;

   for my $i (0..$#last_fields) {
      my ($sex) = $this_fields[$i] =~ /^Sex: (.)/
         or die;

      $last_fields[$i] = $sex . "_" . $last_fields[$i];
   }

   # Changes the first argument in the caller.
   $_[0] = join("\t", @last_fields) . "\n";
}

Answer 3

这样的东西应该在awk中工作。虽然要存储来自第一行的所有数据，但是需要一点内存。

BEGIN {FS="\t"}

NR == 1 {
    for (i = 1; i <= NF; i++) {
        f[i]=$i
    }
    next
}

NR == 2 {
    for (i = 1; i <= NF; i++) {
        $i=gensub(/Sex: ([FM]).*/, "\\1", "g", $i)
        $i=$i"_"f[i]
    }
    print
    next
}

{print}

如果匹配此模式的线对在整个文件中重复，则可能会执行以下操作：

BEGIN {FS="\t"}

line && /^Sex: / {
    split(line, f)
    line=""

    for (i = 1; i <= NF; i++) {
        $i=substr($i, 0, 6)
        gsub(/^Sex: /, "", $i)
        printf "%s ", $i"_"f[i]
    }
    print ""
    next
}

line {print line}

{line=$0}

Answer 4

这是假设输入文件具有要一起解析的重复行对而编写的。它可以很容易地修改为在解析前两行之后停止，但是我将它保持原样，即使它在澄清之后没有回答op的问题。也许它会对其他人有用。

#!perl

use strict;
use warnings;

open(IN, "in.txt") or die $!;
open(OUT, ">out.txt") or die $!;
my $secondLine ;
while(<IN>) {
  my $firstLine = $_;
  chomp $firstLine;
  $secondLine = <IN> || "";
  chomp $secondLine;
  # Break out if there are no more lines with data (actually, this just detects 1-2 blank lines in a row, not necessarily at the end of the file yet)
  if ((! $firstLine) && (! $secondLine)) { last }
  my @firstLine = split(/\s+/, $firstLine);
  my @secondLine = split(/\s*Sex:\s*/, $secondLine);
  # The first element in @secondLine will always be the "null" before the first "Sex: ".
  # Throw it away.
  shift @secondLine;
  if (scalar(@firstLine) != scalar(@secondLine)) { die "Uneven # of fields in these 2 lines:\n$firstLine\n$secondLine\n" }

  # OK, output time.
  for (my $i=0; $i<scalar(@firstLine); $i++) {
    print OUT substr($secondLine[$i], 0, 1) . "_$firstLine[$i] ";
  }
  print OUT "\n";
}
close(IN);
close(OUT);

if (! $secondLine) {
  warn "The file does not appear to have an even number of lines.\n";
}

Answer 5

怎么样：

#!/usr/bin/perl


while(<>) {
   chop;
   @N=split;
   $_=<>;
   chop;
   s/\s*Sex:\s*//g;s/emale/ /g;s/ale/ /g;
   @S=split;
   foreach $k (0..$#N) {
     $i=$N[$k]; $g=$S[$k];
     print "$g" . '_' . "$i  " ;
   }
   print "\n";
}

Answer 6

这可能适合你（GNU sed）：

sed -ri '1{N;:a;s/(\b[0-9]{4}-[GN].*\n)\s*Sex:\s*(.)\S+/\2_\1/;ta;s/\n//}' file

这将第1行和第2行合并，然后执行替换循环，直到无法匹配其他列。

Perl，根据第二行中的值更改第一行中的值，

6 个答案: