将列移动到bash中的header关键字开头

时间:2019-05-17 12:40:25

标签: bash awk

我有一个文件(data.rdb),格式如下:

date    star    jdb texp
2013-11-22  epsInd      2400000.23551544    100.
2013-11-22  epsInd      2400000.23551544    100.
2013-11-22  epsInd      2400000.23551544    100.
2013-11-22  HD217987    2400000.23551544    900.
2013-11-22  TOI-134     2400000.23551544    900.
2013-11-22  tauCet      2400000.23551544    60. 
2013-11-22  BD+01316    2400000.23551544    300.
2013-11-22  BD+01316    2400000.23551544    300.
2013-11-22  BD+01316    2400000.23551544    300.
2013-11-22  BD+01316    2400000.23551544    300.

一些属性:

  • 所有列都用制表符分隔
  • 列的宽度不同
  • 单元格的长度可能不同
  • 文件中的列将比显示的要多得多,并需要几百行
  • 列名可以是任何单词,没有制表符,空格或特殊字符

如何将标题为jdb的列移到第一列?

一些约束:

  • 这将应用于多个文件,并且列jdb不会总是出现在同一位置
  • 理想情况下,其余列的顺序不应更改
  • jdb将始终是最后一列。

谢谢!

更新

这是我目前正在使用的awk块:

BEGIN {
    numCols = split(column_list,cols)
    OFS="\t"
}
{ sub(/\r$/,"") }
NR==1 {
    for (fldNr=1; fldNr<=NF; fldNr++) {
        f[$fldNr] = fldNr
    }
}
{
    for (colNr=1; colNr<=numCols; colNr++) {
        colName = cols[colNr]
        colVal  = (colNr=1 ? $(f["jdb"]): (colNr <= $(f["jdb"] ? 
$(f[colName] -1) : $(f[colName]))))
        printf "%s%s", colVal, (colNr<numCols ? OFS : ORS)
    }
}

但是它没有输出...我(想我)做了什么:

  1. 为每个列标题值分配一个数字

  2. 重复一个范围

    2.1如果iterator = 0->打印列jdb

    2.2如果迭代器<= jdb的列号->打印列号iterator - 1

    2.3 if iterator> jdb的列号->打印列号iterator

(这是我在https://stackoverflow.com/questions/56132249/extract-columns-from-tab-separated-file)

中提出的问题的继续

4 个答案:

答案 0 :(得分:1)

这有点冗长,但是可以完成工作:

awk 'NR==1{for(i=1;i<=NF;i++){if ($i=="jdb") break;}} {printf "%s\t",$i; for (j=1;j<=NF;j++){if (i!=j){printf j==NF||(j==NF-1&&j+1==i)?"%s\n":"%s\t", $j}}}' yourfile.txt

Per Ed Morton的出色建议。这是带有适当空格,缩进和换行符的脚本:

    NR == 1 {
            for (i = 1; i <= NF; i++) {
                    if ($i == "jdb") {
                            break
                    }
            }
    }

    {
            printf "%s\t", $i
            for (j = 1; j <= NF; j++) {
                    if (i != j) {
                            printf (j == NF || j == NF - 1 && j + 1 == i ? "%s\n" : "%s\t"), $j
                    }
            }
    }

您可以将其粘贴到它自己的文件中(例如... script.awk),然后将其命名为:awk -f script.awk yourfile.txt

答案 1 :(得分:1)

好吧,我真的很希望在这里有个“教人钓鱼”的时刻,但是无论如何您都会得到答案,所以...以下是调整the previous answer来做现在想要做的事情:

$ cat tst.awk
BEGIN { FS=OFS="\t" }
NR==1 {
    cols[++numCols] = tgt
    for (fldNr=1; fldNr<=NF; fldNr++) {
        f[$fldNr] = fldNr
        if ($fldNr != tgt) {
            cols[++numCols] = $fldNr
        }
    }
}
{
    for (colNr=1; colNr<=numCols; colNr++) {
        colName = cols[colNr]
        printf "%s%s", $(f[colName]), (colNr<numCols ? OFS : ORS)
    }
}

$ awk -v tgt=jdb -f tst.awk data.rdb
jdb     date    star    texp
2400000.23551544        2013-11-22      epsInd  100.
2400000.23551544        2013-11-22      epsInd  100.
2400000.23551544        2013-11-22      epsInd  100.
2400000.23551544        2013-11-22      HD217987        900.
2400000.23551544        2013-11-22      TOI-134 900.
2400000.23551544        2013-11-22      tauCet  60.
2400000.23551544        2013-11-22      BD+01316        300.
2400000.23551544        2013-11-22      BD+01316        300.
2400000.23551544        2013-11-22      BD+01316        300.
2400000.23551544        2013-11-22      BD+01316        300.

请注意,循环是多么简单,每条输入线在您希望效率达到的地方执行一次,因为所有工作是在NR==1块中完成的,所以很难确定输出顺序整个文件只执行一次。

在这种情况下,您实际上并不关心其他列名,则可以更简洁,更高效地将其编写为:

$ cat tst.awk
BEGIN { FS=OFS="\t" }
NR==1 {
    numOutFlds = 1
    for (inFldNr=1; inFldNr<=NF; inFldNr++) {
        out2inFldNrs[$inFldNr == tgt ? 1 : ++numOutFlds] = inFldNr
    }
}
{
    for (outFldNr=1; outFldNr<=numOutFlds; outFldNr++) {
        inFldNr = out2inFldNrs[outFldNr]
        printf "%s%s", $inFldNr, (outFldNr<numOutFlds ? OFS : ORS)
    }
}

$ awk -v tgt=jdb -f tst.awk data.rdb
jdb     date    star    texp
2400000.23551544        2013-11-22      epsInd  100.
2400000.23551544        2013-11-22      epsInd  100.
2400000.23551544        2013-11-22      epsInd  100.
2400000.23551544        2013-11-22      HD217987        900.
2400000.23551544        2013-11-22      TOI-134 900.
2400000.23551544        2013-11-22      tauCet  60.
2400000.23551544        2013-11-22      BD+01316        300.
2400000.23551544        2013-11-22      BD+01316        300.
2400000.23551544        2013-11-22      BD+01316        300.
2400000.23551544        2013-11-22      BD+01316        300.

答案 2 :(得分:0)

所以任务是双重的:

  • 首先确定哪一列是我们要成为第一列的列
  • 然后更改列的顺序

所以:

# our testing input file
cat <<EOF >file
date    star    jdb texp
2013-11-22  epsInd      2400000.23551544    100.
2013-11-22  epsInd      2400000.23551544    100.
2013-11-22  epsInd      2400000.23551544    100.
2013-11-22  HD217987    2400000.23551544    900.
2013-11-22  TOI-134     2400000.23551544    900.
2013-11-22  tauCet      2400000.23551544    60. 
2013-11-22  BD+01316    2400000.23551544    300.
2013-11-22  BD+01316    2400000.23551544    300.
2013-11-22  BD+01316    2400000.23551544    300.
2013-11-22  BD+01316    2400000.23551544    300.
EOF

# my copy+paste messed up tabs with spaces, fix it
sed 's/[[:space:]]\+/\t/g' -i file


# first we need header count.
# I could remove all characters except tabs and use wc -c
# but was lazy, this will not affect performance anyway
hdrcnt=$(
    head -n1 file |
    tr '\t' '\n' |
    wc -l
)

# get the column number that has jdb
# I get the first line
# substitute tab with newlines
# and get the line number with "jdb"
num=$(
    head -n1 file |
    tr '\t' '\n' |
    grep -n jdb | 
    cut -d: -f1
)

# ten I generate the awk script
# so it's like '{print $num, $1, $2 ... except $num ... $hdrcnt }'
awkarg='{print $'"$num"', '"$(
    seq $hdrcnt |
    grep -v "$num" |
    sed 's/\(.*\)/$\1, /' |
    sed '$s/, //' |
    tr -d '\n'
)"'}'

# finally run awk
awk -vIFS='\t' -vOFS='\t' "$awkarg" file

答案 3 :(得分:0)

在Perl中,您可以从Text::CSV_XS库中受益:

#! /usr/bin/perl
use warnings;
use strict;

use Text::CSV_XS;

open my $fh, '<', shift or die $!;

my $csv = 'Text::CSV_XS'->new({sep_char => "\t"});

my $row = $csv->getline($fh);

my ($jdb) = grep $row->[$_] eq 'jdb', 0 .. $#$row;

do {
    unshift @$row, splice @$row, $jdb, 1;
    $csv->say(*STDOUT, $row);
} while $row = $csv->getline($fh);