awk根据第二个文件3

时间:2017-09-19 18:09:00

标签: bash awk

我正在尝试从此主题的前一个线程运行一些修改过的代码。 我有一个文件data.txt,其中第一行是标题。我想创建一个新文件,只包含与第二个文件(list.txt)中的条目匹配的列。

data.txt中

1,2,3,4,5,6,7,8,9,10
1.000000,0,0,0,0,0,0,0,0,0
0,1.000000,0.031250,0,0,0,0.031250,0,0,0
0,0.031250,1.000000,0,0,0,0.062500,0,0,0
0,0,0,1.000000,0,0,0,0,0,0
0,0,0,0,1.000000,0,0,0,0,0
0,0,0,0,0,1.000000,0,0.062500,0,0
0,0.031250,0.062500,0,0,0,1.000000,0,0,0
0,0,0,0,0,0.062500,0,1.000000,0,0
0,0,0,0,0,0,0,0,1.000000,0

LIST.TXT

3
5
7
9

所需的输出

3,5,7,9
0,0,0,0
0.031250,0,0.031250,0
1.000000,0,0.062500,0
0,0,0,0
0,1.000000,0,0
0,0,0,0
0,1.000000,0
0,0,0,0
0,0,0,1.000000

我使用了以下代码

echo "${DATAFILE:-data.txt}"
echo "${COLUMNFILE:-list.txt}"

awk {
     j=1
     while ((getline < COLUMNFILE) > 0) {
        col[j++] = $1
     }
     n=j-1;
     close(COLUMNFILE)
     for (i=1; i<=n; i++) s[col[i]]=i
   }
   NR==1 {
     for (f=1; f<=NF; f++)
       if ($f in s) c[s[$f]]=f
     next
   }
   { sep=","
     for (f=1; f<=n; f++) {
       printf("%c%s",sep,$c[f])
       sep=FS
     }
     print ""
 } 
 DATAFILE

我得到下面的结果,它复制了data.txt中的行,没有做任何选择。 list.txt中的条目将在文件末尾打印

1,2,3,4,5,6,7,8,9,10
1,2,3,4,5,6,7,8,9,10
1.000000,0,0,0,0,0,0,0,0,0
1.000000,0,0,0,0,0,0,0,0,0

0,1.000000,0.031250,0,0,0,0.031250,0,0,0
0,1.000000,0.031250,0,0,0,0.031250,0,0,0

0,0.031250,1.000000,0,0,0,0.062500,0,0,0
0,0.031250,1.000000,0,0,0,0.062500,0,0,0

0,0,0,1.000000,0,0,0,0,0,0
0,0,0,1.000000,0,0,0,0,0,0

0,0,0,0,1.000000,0,0,0,0,0
0,0,0,0,1.000000,0,0,0,0,0

0,0,0,0,0,1.000000,0,0.062500,0,0
0,0,0,0,0,1.000000,0,0.062500,0,0

0,0.031250,0.062500,0,0,0,1.000000,0,0,0
0,0.031250,0.062500,0,0,0,1.000000,0,0,0

0,0,0,0,0,0.062500,0,1.000000,0,0
0,0,0,0,0,0.062500,0,1.000000,0,0

0,0,0,0,0,0,0,0,1.000000,0
0,0,0,0,0,0,0,0,1.000000,0

3
3

5
5

7
7

9
9

非常感谢任何帮助。

6 个答案:

答案 0 :(得分:2)

$ awk '
    BEGIN { FS=OFS="," }
    NR==FNR { f[++nf]=$0; next }
    { for (i=1; i<=nf; i++) printf "%s%s", $(f[i]), (i<nf?OFS:ORS) }
' list.txt data.txt
3,5,7,9
0,0,0,0
0.031250,0,0.031250,0
1.000000,0,0.062500,0
0,0,0,0
0,1.000000,0,0
0,0,0,0
0.062500,0,1.000000,0
0,0,0,0
0,0,0,1.000000

答案 1 :(得分:1)

您可以将位置文件和数据文件都传递给awk并在内部执行逻辑:

 awk -F"," 'FILENAME=="list.txt"{a[NR]=$1}FILENAME=="data.txt"{for(i=1; i<=length(a); i++){printf (i==length(a)?"%s\n":"%s,"),$a[i]}}' list.txt data.txt

我们在这里:

  1. 使用逗号分隔符(-F","
  2. 拆分传入的文件
  3. 如果FILENAME awk变量是&#34; list.txt&#34; (FILENAME=="list.txt"
  4. ++然后使用行号作为索引(a[NR]=$1
  5. 将行中的值添加到数组中
  6. 如果FILENAME awk变量是&#34; data.txt&#34; (FILENAME=="data.txt"
  7. ++然后循环遍历数组for(i=1; i<=length(a); i++)
  8. 中的每个元素
  9. ++++并在该位置打印出项目的值($a[i])。如果该位置是找到的最后一个位置(i==length(a)),则使用以下("%s\n")后的换行符将其打印出来,否则请使用逗号($a[i])将其打印出来。
  10. 另一种选择是通过-v(变量)标志传递您的头寸,但这对于可变数量的头寸并不好:

    awk -F"," -v f1=$(awk 'NR==1' list.txt) -v f2=$(awk 'NR==2' list.txt) -v f3=$(awk 'NR==3' list.txt) -v f4=$(awk 'NR==4' list.txt) '{print $f1, $f2, $f3, $f4}' data.txt
    

答案 2 :(得分:1)

awk 解决方案:

awk -F, 'function pr(a){ r=""; for(i=1;i<=NF;i++) if(i in a) r=(r!="")? r","$i:$i; print r }
         NR==FNR{ a[$0]; next }{ pr(a) }' list.txt data.txt

输出:

3,5,7,9
0,0,0,0
0.031250,0,0.031250,0
1.000000,0,0.062500,0
0,0,0,0
0,1.000000,0,0
0,0,0,0
0.062500,0,1.000000,0
0,0,0,0
0,0,0,1.000000

答案 3 :(得分:1)

awk解决方案,用于比较和对比......

$ join -t, <(sort list) <(<file tr ',' '\n' | pr -10ts, | sort) | 
  sort -n | 
  tr ',' '\n' | 
  pr -4ts,


3,5,7,9
0,0,0,0
0.031250,0,0.031250,0
1.000000,0,0.062500,0
0,0,0,0
0,1.000000,0,0
0,0,0,0
0.062500,0,1.000000,0
0,0,0,0
0,0,0,1.000000

你需要神奇的数字104,它们是原始文件和提取的文本的列号(这些也可以自动化)。将数字排序转换为词典和后面所需的多种排序(join所需)。

算法本质上是transpose - join - transpose

答案 4 :(得分:1)

这是一个awk,它将list.txt处理为字段列表,并使用该列表调用另一个awk来处理data.txt

$ awk '
BEGIN { FS=OFS="," }          # set the delimiters for the list file
NR==FNR {                     # process the list file
    p=p (p==""?"":OFS) "$" $1 # make a field list ($3,$5,$7,$9)
    next
}
{                             # process the data or call the processor
    RS=""                     # for getline to return multilined output
    cmd="awk \047BEGIN{FS=OFS=\",\"}{print "p"}\047 " FILENAME   # build awk call
    cmd | getline res         # actual awk call and output to res
    print res                 # output res
    exit                      # exit after first record
}
' list data
3,5,7,9
0,0,0,0
0.031250,0,0.031250,0
1.000000,0,0.062500,0
0,0,0,0
0,1.000000,0,0
0,0,0,0
0.062500,0,1.000000,0
0,0,0,0
0,0,0,1.000000

答案 5 :(得分:0)

awk解决方案,考虑到列名可以是任何内容,而不仅仅是列的索引。

org.kohsuke.stapler.WrongTypeException: Got type array but no lister class found for type class java.lang.String
        at org.kohsuke.stapler.RequestImpl$TypePair.convertJSON(RequestImpl.java:723)
        at org.kohsuke.stapler.RequestImpl.bindJSON(RequestImpl.java:478)
        at org.kohsuke.stapler.RequestImpl.instantiate(RequestImpl.java:777)
Caused: java.lang.IllegalArgumentException: Failed to convert the value parameter of the constructor public hudson.model.StringParameterValue(java.lang.String,java.lang.String)
        at org.kohsuke.stapler.RequestImpl.instantiate(RequestImpl.java:779)
        at org.kohsuke.stapler.RequestImpl.access$200(RequestImpl.java:83)
        at org.kohsuke.stapler.RequestImpl$TypePair.convertJSON(RequestImpl.java:678)
Caused: java.lang.IllegalArgumentException: Failed to instantiate class hudson.model.StringParameterValue from {"name":"thisIsAList","value":["one","two","three"]}
        at org.kohsuke.stapler.RequestImpl$TypePair.convertJSON(RequestImpl.java:680)
        at org.kohsuke.stapler.RequestImpl.bindJSON(RequestImpl.java:478)
        at org.kohsuke.stapler.RequestImpl.bindJSON(RequestImpl.java:474)
        at hudson.model.StringParameterDefinition.createValue(StringParameterDefinition.java:88)
        at hudson.model.ParametersDefinitionProperty._doBuild(ParametersDefinitionProperty.java:165)

输入:

BEGIN { FS=OFS="," }
NR==FNR { l[$0]++; next }                      # save headers from list
FNR==1{ for (i=1; i<=NF; i++) 
            if ($i in l){ max=i; c[i]++ }}         # save column index in c;
                                                   # max index in max 
{ for(j=1; j<=NF; j++)                             # loop over column indices
      if(j in c)                                   # if index in c
          printf "%s%s", $j, (j==max ? ORS : OFS)  # print column
}

$ cat list.txt
C
E
G
I

结果如下:

$ cat data.txt
A,B,C,D,E,F,G,H,I,J
1.000000,0,0,0,0,0,0,0,0,0
0,1.000000,0.031250,0,0,0,0.031250,0,0,0
0,0.031250,1.000000,0,0,0,0.062500,0,0,0
0,0,0,1.000000,0,0,0,0,0,0
0,0,0,0,1.000000,0,0,0,0,0
0,0,0,0,0,1.000000,0,0.062500,0,0
0,0.031250,0.062500,0,0,0,1.000000,0,0,0
0,0,0,0,0,0.062500,0,1.000000,0,0
0,0,0,0,0,0,0,0,1.000000,0