如何使用linux comand awks在多个字段文件中拆分一个字段文件?

时间:2016-08-26 19:49:14

标签: awk

我有以下文件list.txt

AbateI.       D
AcatulloM.    A
AcerbiF.      D
AcquafrescaR. A
AcquahA.      C
AdjapongC.    D
AdnanA.       D
AdrianoL.     A
AjetiA.       D
AlbiolR.      D
AldeganiG.    P
AleesamiH.    D
AlexSandro    D
AlissonR.     P

我想用awk重新排列文件,将它们按第二列分组,如下所示:

P                    D              C                 A
AldeganiG.         AbateI.         AcquahA.         AcatulloM. 
AlissonR.          AcerbiF.                         AcquafrescaR.
                   AdjapongC.                       AdrianoL. 
                   AdnanA. 
                   AjetiA. 
                   AlbiolR. 
                   AleesamiH.
                   AlexSandro 

这就是我的尝试:

#!/usr/bin/awk -f

BEGIN {
FORMAT="\t%-20s%-20s%-20s%s\n"
printf FORMAT,"P","D","C","A"
}

($2=="P")  {a[$1] = $1}
($2=="D")  {b[$1] = $1}
($2=="C")  {c[$1] = $1}
($2=="A")  {d[$1] = $1}

END{for(i in a) printf FORMAT, a[i],"","",""}

但我不知道如何循环和打印其他数组。

6 个答案:

答案 0 :(得分:2)

filenames = {'3000_0_100ms.txt', '3000_0_5s.txt', '3000_110_5s.txt'};
strs = regexp(filenames, '\_(\d+)\_', 'match');
strs = [strs{:}];  % Denest one layer of cells
strs = regexprep(strs, '\_', '');
nums = str2double(strs);

阅读有效的Awk编程,第4版,作者Arnold Robbins。

答案 1 :(得分:2)

您可以使用pastecolumn进行一些流程替换:

$ paste \
      <(awk '/P$/ {print $1}'<input) \
      <(awk '/D$/ {print $1}'<input) \
      <(awk '/C$/ {print $1}'<input) \
      <(awk '/A$/ {print $1}'<input) | column -s $'\t' -t
AldeganiG.  AbateI.     AcquahA.  AcatulloM.
AlissonR.   AcerbiF.              AcquafrescaR.
            AdjapongC.            AdrianoL.
            AdnanA.
            AjetiA.
            AlbiolR.
            AleesamiH.
            AlexSandro

如果您愿意,可以手动添加列标题。

答案 2 :(得分:2)

您也可以使用grep-cut-paste-expand组合

paste \
   <(echo "P";grep 'P$' list.txt |cut -d ' ' -f1 ) \
   <(echo "D";grep 'D$' list.txt |cut -d ' ' -f1 ) \
   <(echo "C";grep 'C$' list.txt |cut -d ' ' -f1 ) \
   <(echo "A";grep 'A$' list.txt |cut -d ' ' -f1) | expand -t 20

<强>输出

P                   D                   C                   A
AldeganiG.          AbateI.             AcquahA.            AcatulloM.
AlissonR.           AcerbiF.                                AcquafrescaR.
                    AdjapongC.                              AdrianoL.
                    AdnanA.                                 
                    AjetiA.                                 
                    AlbiolR.                                
                    AleesamiH.                              
                    AlexSandro                              

您可以将grep-cut替换为sed,如下所示

paste \
    <(echo "P";sed -n '/P$/{s/[[:blank:]]*P$//;p}' file ) \
    <(echo "D";sed -n '/D$/{s/[[:blank:]]*D$//;p}' file ) \
    <(echo "C";sed -n '/C$/{s/[[:blank:]]*C$//;p}' file ) \
    <(echo "A";sed -n '/A$/{s/[[:blank:]]*A$//;p}' file ) | expand -t 20

<强>输出

P                   D                   C                   A
AldeganiG.          AbateI.             AcquahA.            AcatulloM.
AlissonR.           AcerbiF.                                AcquafrescaR.
                    AdjapongC.                              AdrianoL.
                    AdnanA.                                 
                    AjetiA.                                 
                    AlbiolR.                                
                    AleesamiH.                              
                    AlexSandro   

你也可以这样做

paste \
     <(awk 'BEGIN{print "P"}/P$/{print $1}' file )
     <(awk 'BEGIN{print "D"}/D$/{print $1}' file )
     <(awk 'BEGIN{print "C"}/C$/{print $1}' file )
     <(awk 'BEGIN{print "A"}/A$/{print $1}' file ) | expand -t 20

<强>输出继电器

P                   D                   C                   A
AldeganiG.          AbateI.             AcquahA.            AcatulloM.
AlissonR.           AcerbiF.                                AcquafrescaR.
                    AdjapongC.                              AdrianoL.
                    AdnanA.                                 
                    AjetiA.                                 
                    AlbiolR.                                
                    AleesamiH.                              
                    AlexSandro                              

答案 3 :(得分:1)

这是一种非传统的方法

$ awk -v OFS='\n' '{a[$2]=a[$2] OFS $1; 
                    c[$2]++; 
                    if(c[$2]>max) max=c[$2]} 
                END{pr="pr -"length(c)"t"; 
                    for(k in a) 
                       {print k a[k] | pr; 
                        for(i=c[k];i<max;i++) 
                           {print ""  | pr}}}'

A                 P                 C                 D
AcatulloM.        AldeganiG.        AcquahA.          AbateI.
AcquafrescaR.     AlissonR.                           AcerbiF.
AdrianoL.                                             AdjapongC.
                                                      AdnanA.
                                                      AjetiA.
                                                      AlbiolR.
                                                      AleesamiH.
                                                      AlexSandro

请注意,列的顺序有些随意,但值按插入顺序列出。

此方法也没有遵循传统的&#34;转置&#34;具有二维数组的方法。或许更好地学习它。

对于几乎相同的问题,该网站已有很多答案。

答案 4 :(得分:1)

使用awk 4.0 2D阵列的解决方案 - 允许以任何顺序输出任意数量的组

# output order of groups
order=$*
awk -vorderstr="$order" '
BEGIN { split(orderstr, order) }
{
# grpnames[group][index]=name
  grpnames[$2][grpi[$2]++]=$1
# track max group size
  if(grpi[$2] > maxgrpsz)
    maxgrpsz=grpi[$2]
}
END {
# print groups header in order
printf("%-20s", order[1])
for(j=2; j <= length(order); ++j) {
  printf("\t%-20s", order[j])
}
printf("\n")
for(i=0; i < maxgrpsz; ++i) {
# run across each group in output order
  printf("%-20s", grpnames[order[1]][i])
  for(j=2; j <= length(order); ++j) {
    grp=order[j]
    printf("\t%-20s", grpnames[grp][i])
  }
  printf("\n")
}
}
'

测试

./myscr.sh P D C A <in.txt
P                       D                       C                       A
AldeganiG.              AbateI.                 AcquahA.                AcatulloM.
AlissonR.               AcerbiF.                                        AcquafrescaR.
                        AdjapongC.                                      AdrianoL.
                        AdnanA.
                        AjetiA.
                        AlbiolR.
                        AleesamiH.
                        AlexSandro
./myscr.sh D A P C <in.txt
D                       A                       P                       C
AbateI.                 AcatulloM.              AldeganiG.              AcquahA.
AcerbiF.                AcquafrescaR.           AlissonR.
AdjapongC.              AdrianoL.
AdnanA.
AjetiA.
AlbiolR.
AleesamiH.
AlexSandro

./myscr.sh A P <in.txt
A                       P
AcatulloM.              AldeganiG.
AcquafrescaR.           AlissonR.
AdrianoL.

答案 5 :(得分:0)

在GNU awk中:

$ cat > list.awk
{
    n=(n<++b[$2]?b[$2]:n)                # n is the max count of words in one group
    a[$2][b[$2]]=$1                      # put words to two dimensional array
} 
END {
    for(i=1;i<=n;i++) {                  # from 1 to n
        for(j in a)                      # for all groups
            printf "%14-s%s",a[j][i],OFS # print a word
        printf "%s",ORS                  # ORS in the end
    }
}
$ -f list.awk list.txt
AcatulloM.     AldeganiG.     AcquahA.       AbateI.        
AcquafrescaR.  AlissonR.                     AcerbiF.       
AdrianoL.                                    AdjapongC.     
                                             AdnanA.        
                                             AjetiA.        
                                             AlbiolR.       
                                             AleesamiH.     
                                             AlexSandro