我有一个文件(data.rdb),格式如下:
date star jdb texp
2013-11-22 epsInd 2400000.23551544 100.
2013-11-22 epsInd 2400000.23551544 100.
2013-11-22 epsInd 2400000.23551544 100.
2013-11-22 HD217987 2400000.23551544 900.
2013-11-22 TOI-134 2400000.23551544 900.
2013-11-22 tauCet 2400000.23551544 60.
2013-11-22 BD+01316 2400000.23551544 300.
2013-11-22 BD+01316 2400000.23551544 300.
2013-11-22 BD+01316 2400000.23551544 300.
2013-11-22 BD+01316 2400000.23551544 300.
一些属性:
如何将标题为jdb
的列移到第一列?
一些约束:
jdb
不会总是出现在同一位置jdb
将始终是最后一列。谢谢!
更新
这是我目前正在使用的awk
块:
BEGIN {
numCols = split(column_list,cols)
OFS="\t"
}
{ sub(/\r$/,"") }
NR==1 {
for (fldNr=1; fldNr<=NF; fldNr++) {
f[$fldNr] = fldNr
}
}
{
for (colNr=1; colNr<=numCols; colNr++) {
colName = cols[colNr]
colVal = (colNr=1 ? $(f["jdb"]): (colNr <= $(f["jdb"] ?
$(f[colName] -1) : $(f[colName]))))
printf "%s%s", colVal, (colNr<numCols ? OFS : ORS)
}
}
但是它没有输出...我(想我)做了什么:
为每个列标题值分配一个数字
重复一个范围
2.1如果iterator = 0->打印列jdb
2.2如果迭代器<= jdb的列号->打印列号iterator - 1
2.3 if iterator> jdb的列号->打印列号iterator
(这是我在https://stackoverflow.com/questions/56132249/extract-columns-from-tab-separated-file)
中提出的问题的继续答案 0 :(得分:1)
这有点冗长,但是可以完成工作:
awk 'NR==1{for(i=1;i<=NF;i++){if ($i=="jdb") break;}} {printf "%s\t",$i; for (j=1;j<=NF;j++){if (i!=j){printf j==NF||(j==NF-1&&j+1==i)?"%s\n":"%s\t", $j}}}' yourfile.txt
Per Ed Morton的出色建议。这是带有适当空格,缩进和换行符的脚本:
NR == 1 {
for (i = 1; i <= NF; i++) {
if ($i == "jdb") {
break
}
}
}
{
printf "%s\t", $i
for (j = 1; j <= NF; j++) {
if (i != j) {
printf (j == NF || j == NF - 1 && j + 1 == i ? "%s\n" : "%s\t"), $j
}
}
}
您可以将其粘贴到它自己的文件中(例如... script.awk),然后将其命名为:awk -f script.awk yourfile.txt
答案 1 :(得分:1)
好吧,我真的很希望在这里有个“教人钓鱼”的时刻,但是无论如何您都会得到答案,所以...以下是调整the previous answer来做现在想要做的事情:
$ cat tst.awk
BEGIN { FS=OFS="\t" }
NR==1 {
cols[++numCols] = tgt
for (fldNr=1; fldNr<=NF; fldNr++) {
f[$fldNr] = fldNr
if ($fldNr != tgt) {
cols[++numCols] = $fldNr
}
}
}
{
for (colNr=1; colNr<=numCols; colNr++) {
colName = cols[colNr]
printf "%s%s", $(f[colName]), (colNr<numCols ? OFS : ORS)
}
}
$ awk -v tgt=jdb -f tst.awk data.rdb
jdb date star texp
2400000.23551544 2013-11-22 epsInd 100.
2400000.23551544 2013-11-22 epsInd 100.
2400000.23551544 2013-11-22 epsInd 100.
2400000.23551544 2013-11-22 HD217987 900.
2400000.23551544 2013-11-22 TOI-134 900.
2400000.23551544 2013-11-22 tauCet 60.
2400000.23551544 2013-11-22 BD+01316 300.
2400000.23551544 2013-11-22 BD+01316 300.
2400000.23551544 2013-11-22 BD+01316 300.
2400000.23551544 2013-11-22 BD+01316 300.
请注意,循环是多么简单,每条输入线在您希望效率达到的地方执行一次,因为所有工作是在NR==1
块中完成的,所以很难确定输出顺序整个文件只执行一次。
在这种情况下,您实际上并不关心其他列名,则可以更简洁,更高效地将其编写为:
$ cat tst.awk
BEGIN { FS=OFS="\t" }
NR==1 {
numOutFlds = 1
for (inFldNr=1; inFldNr<=NF; inFldNr++) {
out2inFldNrs[$inFldNr == tgt ? 1 : ++numOutFlds] = inFldNr
}
}
{
for (outFldNr=1; outFldNr<=numOutFlds; outFldNr++) {
inFldNr = out2inFldNrs[outFldNr]
printf "%s%s", $inFldNr, (outFldNr<numOutFlds ? OFS : ORS)
}
}
$ awk -v tgt=jdb -f tst.awk data.rdb
jdb date star texp
2400000.23551544 2013-11-22 epsInd 100.
2400000.23551544 2013-11-22 epsInd 100.
2400000.23551544 2013-11-22 epsInd 100.
2400000.23551544 2013-11-22 HD217987 900.
2400000.23551544 2013-11-22 TOI-134 900.
2400000.23551544 2013-11-22 tauCet 60.
2400000.23551544 2013-11-22 BD+01316 300.
2400000.23551544 2013-11-22 BD+01316 300.
2400000.23551544 2013-11-22 BD+01316 300.
2400000.23551544 2013-11-22 BD+01316 300.
答案 2 :(得分:0)
所以任务是双重的:
所以:
# our testing input file
cat <<EOF >file
date star jdb texp
2013-11-22 epsInd 2400000.23551544 100.
2013-11-22 epsInd 2400000.23551544 100.
2013-11-22 epsInd 2400000.23551544 100.
2013-11-22 HD217987 2400000.23551544 900.
2013-11-22 TOI-134 2400000.23551544 900.
2013-11-22 tauCet 2400000.23551544 60.
2013-11-22 BD+01316 2400000.23551544 300.
2013-11-22 BD+01316 2400000.23551544 300.
2013-11-22 BD+01316 2400000.23551544 300.
2013-11-22 BD+01316 2400000.23551544 300.
EOF
# my copy+paste messed up tabs with spaces, fix it
sed 's/[[:space:]]\+/\t/g' -i file
# first we need header count.
# I could remove all characters except tabs and use wc -c
# but was lazy, this will not affect performance anyway
hdrcnt=$(
head -n1 file |
tr '\t' '\n' |
wc -l
)
# get the column number that has jdb
# I get the first line
# substitute tab with newlines
# and get the line number with "jdb"
num=$(
head -n1 file |
tr '\t' '\n' |
grep -n jdb |
cut -d: -f1
)
# ten I generate the awk script
# so it's like '{print $num, $1, $2 ... except $num ... $hdrcnt }'
awkarg='{print $'"$num"', '"$(
seq $hdrcnt |
grep -v "$num" |
sed 's/\(.*\)/$\1, /' |
sed '$s/, //' |
tr -d '\n'
)"'}'
# finally run awk
awk -vIFS='\t' -vOFS='\t' "$awkarg" file
答案 3 :(得分:0)
在Perl中,您可以从Text::CSV_XS库中受益:
#! /usr/bin/perl
use warnings;
use strict;
use Text::CSV_XS;
open my $fh, '<', shift or die $!;
my $csv = 'Text::CSV_XS'->new({sep_char => "\t"});
my $row = $csv->getline($fh);
my ($jdb) = grep $row->[$_] eq 'jdb', 0 .. $#$row;
do {
unshift @$row, splice @$row, $jdb, 1;
$csv->say(*STDOUT, $row);
} while $row = $csv->getline($fh);