Question

请帮我这个小脚本我正在尝试用一个大文件（tabseparated）（mainFileWithValues.txt）中的值来填充一些列，这些列具有以下格式：

print(t_age/int(last_id))

列名在column.nam

中

A   B    C  ......... (total 700 columns)
80  2.08  23  
14 1.88  30  
12 1.81 40

到20 nmes

我首先使用以下方法从大文件中获取列号：

cat  columnnam.nam

A
B
.
.
.

然后使用剪切我正在提取值

我做了一个for循环

sed -n "1 s/${i}.*//p" mainFileWithValues.txt | sed 's/[^\t*]//g' |wc -c

我的问题是我希望输出test.txt在主文件列中。即。

#/bin/bash

for i in `cat columnnam.nam`    
do    
  cut -f`sed -n "1 s/${i}.*//p" mainFileWithValues.txt | sed 's/[^\t*]//g' |wc -c` mainFileWithValues.txt > test.txt    
done


cat test.txt    
A    
80    
14    
12    
B    
2.08    
1.88    
1.81

如何在此脚本中修复此问题？

Answer 1

以下是单行：

awk 'FNR==NR{h[NR]=$1;next}{for(i=1; i in h; i++){if(FNR==1){for(j=1; j<=NF; j++){if(tolower(h[i])==tolower($j)){d[i]=j; break }}}printf("%s%s",i>1 ? OFS:"",  i in d ?$(d[i]):"")}print ""}' columns.nam mainfile

<强> 说明：

[注意：不区分大小写的标题匹配，如果您想要严格匹配，请删除tolower()]

awk '
    FNR==NR{                       # Here we read columns.nam file
       h[NR]=$1;                   # h -> array, NR -> as array key, $1 -> as array value
       next                        # go to next line
    }
    {                              # Here we read second file

     for(i=1; i in h; i++)         # iterate array h
     {
       if(FNR==1)                  # if we are reading 1st row of second file, will parse header
       {
        for(j=1; j<=NF; j++)       # iterate over fields of 1st row fields
        {
            # if it was the field we are looking for
            if(tolower(h[i])==tolower($j))
            {
              # then 
              # d -> array, i -> as array key which is column order number
              # j -> as array value which is column number
              d[i]=j; 
              break 
            }
        }
       }    
       # for all records
       # if field we searched was found then print such field
       # from d[i] we access, column number

       printf("%s%s",i>1 ? OFS:"",  i in d ? $(d[i]): "");
      }

      # print newline char
      print ""
    }
    ' columns.nam mainfile

测试结果：

$ cat mainfile 
A   B    C  
80  2.08  23  
14 1.88  30  
12 1.81 40

$ cat columns.nam 
A
C

$ awk 'FNR==NR{h[NR]=$1;next}{for(i=1; i in h; i++){if(FNR==1){for(j=1; j<=NF; j++){if(tolower(h[i])==tolower($j)){d[i]=j; break }}}printf("%s%s",i>1 ? OFS:"",  i in d ?$(d[i]):"")}print ""}' columns.nam mainfile 
A C
80 23
14 30
12 40

您还可以制作脚本并运行

akshay@db-3325:/tmp$ cat col_parser.awk 
FNR == NR {
  h[NR] = $1;
  next
} 
{
  for (i = 1; i in h; i++) {
    if (FNR == 1) {
      for (j = 1; j <= NF; j++) {
        if (tolower(h[i]) == tolower($j)) {
          d[i] = j;
          break
        }
      }
    }
    printf("%s%s", i > 1 ? OFS : "", i in d ? $(d[i]) : "");
  }
  print ""
}

akshay@db-3325:/tmp$ awk -v OFS="\t" -f col_parser.awk columns.nam mainfile 
A      C
80     23
14     30
12     40

类似的答案

AWK to display a column based on Column name and remove header and last delimiter

Answer 2

另一种awk方法：

awk 'NR == FNR {
   hdr[$1]
   next
}
FNR == 1 {
   for (i=1; i<=NF; i++)
      if ($i in hdr)
         h[i]
}
{
   s=""
   for (i in h)
      s = s (s == "" ? "" : OFS) $i
   print s
}' column.nam mainFileWithValues.txt

A B
80 2.08
14 1.88
12 1.81

要将格式化输出管道上方的命令转到column -t

使用值提取所需的列

2 个答案: