我正在尝试从此主题的前一个线程运行一些修改过的代码。 我有一个文件data.txt,其中第一行是标题。我想创建一个新文件,只包含与第二个文件(list.txt)中的条目匹配的列。
data.txt中
1,2,3,4,5,6,7,8,9,10
1.000000,0,0,0,0,0,0,0,0,0
0,1.000000,0.031250,0,0,0,0.031250,0,0,0
0,0.031250,1.000000,0,0,0,0.062500,0,0,0
0,0,0,1.000000,0,0,0,0,0,0
0,0,0,0,1.000000,0,0,0,0,0
0,0,0,0,0,1.000000,0,0.062500,0,0
0,0.031250,0.062500,0,0,0,1.000000,0,0,0
0,0,0,0,0,0.062500,0,1.000000,0,0
0,0,0,0,0,0,0,0,1.000000,0
LIST.TXT
3
5
7
9
所需的输出
3,5,7,9
0,0,0,0
0.031250,0,0.031250,0
1.000000,0,0.062500,0
0,0,0,0
0,1.000000,0,0
0,0,0,0
0,1.000000,0
0,0,0,0
0,0,0,1.000000
我使用了以下代码
echo "${DATAFILE:-data.txt}"
echo "${COLUMNFILE:-list.txt}"
awk {
j=1
while ((getline < COLUMNFILE) > 0) {
col[j++] = $1
}
n=j-1;
close(COLUMNFILE)
for (i=1; i<=n; i++) s[col[i]]=i
}
NR==1 {
for (f=1; f<=NF; f++)
if ($f in s) c[s[$f]]=f
next
}
{ sep=","
for (f=1; f<=n; f++) {
printf("%c%s",sep,$c[f])
sep=FS
}
print ""
}
DATAFILE
我得到下面的结果,它复制了data.txt中的行,没有做任何选择。 list.txt中的条目将在文件末尾打印
1,2,3,4,5,6,7,8,9,10
1,2,3,4,5,6,7,8,9,10
1.000000,0,0,0,0,0,0,0,0,0
1.000000,0,0,0,0,0,0,0,0,0
0,1.000000,0.031250,0,0,0,0.031250,0,0,0
0,1.000000,0.031250,0,0,0,0.031250,0,0,0
0,0.031250,1.000000,0,0,0,0.062500,0,0,0
0,0.031250,1.000000,0,0,0,0.062500,0,0,0
0,0,0,1.000000,0,0,0,0,0,0
0,0,0,1.000000,0,0,0,0,0,0
0,0,0,0,1.000000,0,0,0,0,0
0,0,0,0,1.000000,0,0,0,0,0
0,0,0,0,0,1.000000,0,0.062500,0,0
0,0,0,0,0,1.000000,0,0.062500,0,0
0,0.031250,0.062500,0,0,0,1.000000,0,0,0
0,0.031250,0.062500,0,0,0,1.000000,0,0,0
0,0,0,0,0,0.062500,0,1.000000,0,0
0,0,0,0,0,0.062500,0,1.000000,0,0
0,0,0,0,0,0,0,0,1.000000,0
0,0,0,0,0,0,0,0,1.000000,0
3
3
5
5
7
7
9
9
非常感谢任何帮助。
答案 0 :(得分:2)
$ awk '
BEGIN { FS=OFS="," }
NR==FNR { f[++nf]=$0; next }
{ for (i=1; i<=nf; i++) printf "%s%s", $(f[i]), (i<nf?OFS:ORS) }
' list.txt data.txt
3,5,7,9
0,0,0,0
0.031250,0,0.031250,0
1.000000,0,0.062500,0
0,0,0,0
0,1.000000,0,0
0,0,0,0
0.062500,0,1.000000,0
0,0,0,0
0,0,0,1.000000
答案 1 :(得分:1)
您可以将位置文件和数据文件都传递给awk并在内部执行逻辑:
awk -F"," 'FILENAME=="list.txt"{a[NR]=$1}FILENAME=="data.txt"{for(i=1; i<=length(a); i++){printf (i==length(a)?"%s\n":"%s,"),$a[i]}}' list.txt data.txt
我们在这里:
-F","
)FILENAME=="list.txt"
)a[NR]=$1
)FILENAME=="data.txt"
)for(i=1; i<=length(a); i++)
$a[i]
)。如果该位置是找到的最后一个位置(i==length(a)
),则使用以下("%s\n"
)后的换行符将其打印出来,否则请使用逗号($a[i]
)将其打印出来。另一种选择是通过-v(变量)标志传递您的头寸,但这对于可变数量的头寸并不好:
awk -F"," -v f1=$(awk 'NR==1' list.txt) -v f2=$(awk 'NR==2' list.txt) -v f3=$(awk 'NR==3' list.txt) -v f4=$(awk 'NR==4' list.txt) '{print $f1, $f2, $f3, $f4}' data.txt
答案 2 :(得分:1)
awk 解决方案:
awk -F, 'function pr(a){ r=""; for(i=1;i<=NF;i++) if(i in a) r=(r!="")? r","$i:$i; print r }
NR==FNR{ a[$0]; next }{ pr(a) }' list.txt data.txt
输出:
3,5,7,9
0,0,0,0
0.031250,0,0.031250,0
1.000000,0,0.062500,0
0,0,0,0
0,1.000000,0,0
0,0,0,0
0.062500,0,1.000000,0
0,0,0,0
0,0,0,1.000000
答案 3 :(得分:1)
非awk
解决方案,用于比较和对比......
$ join -t, <(sort list) <(<file tr ',' '\n' | pr -10ts, | sort) |
sort -n |
tr ',' '\n' |
pr -4ts,
3,5,7,9
0,0,0,0
0.031250,0,0.031250,0
1.000000,0,0.062500,0
0,0,0,0
0,1.000000,0,0
0,0,0,0
0.062500,0,1.000000,0
0,0,0,0
0,0,0,1.000000
你需要神奇的数字10
和4
,它们是原始文件和提取的文本的列号(这些也可以自动化)。将数字排序转换为词典和后面所需的多种排序(join
所需)。
算法本质上是transpose
- join
- transpose
答案 4 :(得分:1)
这是一个awk,它将list.txt
处理为字段列表,并使用该列表调用另一个awk来处理data.txt
:
$ awk '
BEGIN { FS=OFS="," } # set the delimiters for the list file
NR==FNR { # process the list file
p=p (p==""?"":OFS) "$" $1 # make a field list ($3,$5,$7,$9)
next
}
{ # process the data or call the processor
RS="" # for getline to return multilined output
cmd="awk \047BEGIN{FS=OFS=\",\"}{print "p"}\047 " FILENAME # build awk call
cmd | getline res # actual awk call and output to res
print res # output res
exit # exit after first record
}
' list data
3,5,7,9
0,0,0,0
0.031250,0,0.031250,0
1.000000,0,0.062500,0
0,0,0,0
0,1.000000,0,0
0,0,0,0
0.062500,0,1.000000,0
0,0,0,0
0,0,0,1.000000
答案 5 :(得分:0)
awk解决方案,考虑到列名可以是任何内容,而不仅仅是列的索引。
org.kohsuke.stapler.WrongTypeException: Got type array but no lister class found for type class java.lang.String
at org.kohsuke.stapler.RequestImpl$TypePair.convertJSON(RequestImpl.java:723)
at org.kohsuke.stapler.RequestImpl.bindJSON(RequestImpl.java:478)
at org.kohsuke.stapler.RequestImpl.instantiate(RequestImpl.java:777)
Caused: java.lang.IllegalArgumentException: Failed to convert the value parameter of the constructor public hudson.model.StringParameterValue(java.lang.String,java.lang.String)
at org.kohsuke.stapler.RequestImpl.instantiate(RequestImpl.java:779)
at org.kohsuke.stapler.RequestImpl.access$200(RequestImpl.java:83)
at org.kohsuke.stapler.RequestImpl$TypePair.convertJSON(RequestImpl.java:678)
Caused: java.lang.IllegalArgumentException: Failed to instantiate class hudson.model.StringParameterValue from {"name":"thisIsAList","value":["one","two","three"]}
at org.kohsuke.stapler.RequestImpl$TypePair.convertJSON(RequestImpl.java:680)
at org.kohsuke.stapler.RequestImpl.bindJSON(RequestImpl.java:478)
at org.kohsuke.stapler.RequestImpl.bindJSON(RequestImpl.java:474)
at hudson.model.StringParameterDefinition.createValue(StringParameterDefinition.java:88)
at hudson.model.ParametersDefinitionProperty._doBuild(ParametersDefinitionProperty.java:165)
输入:
BEGIN { FS=OFS="," }
NR==FNR { l[$0]++; next } # save headers from list
FNR==1{ for (i=1; i<=NF; i++)
if ($i in l){ max=i; c[i]++ }} # save column index in c;
# max index in max
{ for(j=1; j<=NF; j++) # loop over column indices
if(j in c) # if index in c
printf "%s%s", $j, (j==max ? ORS : OFS) # print column
}
和
$ cat list.txt
C
E
G
I
结果如下:
$ cat data.txt
A,B,C,D,E,F,G,H,I,J
1.000000,0,0,0,0,0,0,0,0,0
0,1.000000,0.031250,0,0,0,0.031250,0,0,0
0,0.031250,1.000000,0,0,0,0.062500,0,0,0
0,0,0,1.000000,0,0,0,0,0,0
0,0,0,0,1.000000,0,0,0,0,0
0,0,0,0,0,1.000000,0,0.062500,0,0
0,0.031250,0.062500,0,0,0,1.000000,0,0,0
0,0,0,0,0,0.062500,0,1.000000,0,0
0,0,0,0,0,0,0,0,1.000000,0