我有一个bash脚本,其中包含一些AWK,用于解决我要解决的问题。
<targets.txt xargs -n1 -P4 bash -c "
awk 'NR==FNR{a[\$0];next}
{
if (\$0 in a)
{
printf \"1,\"
}
else
{
printf \"0,\"
}
}' \"\$1\" values.txt | sed $'s\x01$\x01'\"\$(<<<\"\$1\" cut -d/ -f3)\"'\n'$'\x01'
如果在1,
中出现$ 0,它将打印“ a
”,否则将打印“ 0,
”。但是,如果有发生,我不想打印1,而是希望打印出现的次数。
有没有办法做到这一点?
targets.txt示例
./dataset/tallperson/file1.txt
./dataset/tallperson/file2.txt
./dataset/tallperson/file3.txt
./dataset/shortperson/file4.txt
示例./dataset/tallperson/file1.txt
LOL
Lol
Hel
lo.
示例./dataset/tallperson/file2.txt
LOL
LOL
Wei
rd.
示例./dataset/tallperson/file3.txt
Lol
Lol
示例./dataset/shortperson/file4.txt
hah
a t
hat
was
fun
ny.
LOL
LOL
values.txt示例
LOL
Lol
Hel
lo.
Wei
rd.
hah
a t
hat
was
fun
ny.
所需的输出
1,1,1,1,0,0,0,0,0,0,0,0,tallperson
2,0,0,0,1,1,0,0,0,0,0,0,tallperson
0,2,0,0,0,0,0,0,0,0,0,0,tallperson
2,0,0,0,0,0,1,1,1,1,1,1,shortperson
不需要的输出(来自我的脚本)
1,1,1,1,0,0,0,0,0,0,0,0,tallperson
1,0,0,0,1,1,0,0,0,0,0,0,tallperson
0,1,0,0,0,0,0,0,0,0,0,0,tallperson
1,0,0,0,0,0,1,1,1,1,1,1,shortperson
我有values.txt,其中包含target.txt中每个文件的唯一3个字符的值的列表。没有file.txt包含targets.txt中没有的值。我只想查看targets.txt中的每个文件,并计算values.txt中文件包含的每个值的数量。
答案 0 :(得分:1)
除了awk之外,您不需要执行其他任何操作,例如使用gensub(),ARGIND和ENDFILE的GNU awk:
$ cat tst.awk
BEGIN { OFS="," }
ARGIND == 1 {
ARGV[ARGC] = $0
ARGC++
next
}
ARGIND == 2 {
strings[++numStrings] = $0
next
}
{ cnt[$0]++ }
ENDFILE {
if ( ARGIND > 2 ) {
for (stringNr=1; stringNr<=numStrings; stringNr++) {
string = strings[stringNr]
printf "%d%s", cnt[string], OFS
}
print gensub(/(.*\/)?([^/]+)\/[^/]+$/,"\\2",1,FILENAME)
delete cnt
}
}
$ awk -f tst.awk targets.txt values.txt
1,1,1,1,0,0,0,0,0,0,0,0,tallperson
2,0,0,0,1,1,0,0,0,0,0,0,tallperson
0,2,0,0,0,0,0,0,0,0,0,0,tallperson
2,0,0,0,0,0,1,1,1,1,1,1,shortperson
当然,实际上您实际上不需要“ values.txt”文件,除非您确实确实需要无法根据输入确定输出字段的特定顺序:
$ cat tst.awk
BEGIN { OFS="," }
ARGIND == 1 {
ARGV[ARGC] = $0
ARGC++
next
}
{
if ( !seen[$0]++ ) {
strings[++numStrings] = $0
}
cnt[ARGIND,$0]++
}
END {
for (stringNr=1; stringNr<=numStrings; stringNr++) {
string = strings[stringNr]
printf "%s%s", string, OFS
}
print "directory"
for (fileNr=2; fileNr<=ARGIND; fileNr++) {
for (stringNr=1; stringNr<=numStrings; stringNr++) {
string = strings[stringNr]
printf "%d%s", cnt[fileNr,string], OFS
}
print gensub(/(.*\/)?([^/]+)\/[^/]+$/,"\\2",1,ARGV[fileNr])
}
}
$ awk -f tst.awk targets.txt
LOL,Lol,Hel,lo.,Wei,rd.,hah,a t,hat,was,fun,ny.,directory
1,1,1,1,0,0,0,0,0,0,0,0,tallperson
2,0,0,0,1,1,0,0,0,0,0,0,tallperson
0,2,0,0,0,0,0,0,0,0,0,0,tallperson
2,0,0,0,0,0,1,1,1,1,1,1,shortperson
我在第二个脚本中添加了标头-如果您不想要它,则不要添加它。
如果您真的不在乎输出顺序,那么您所需要做的就是:
$ cat tst.awk
BEGIN { OFS="," }
ARGIND == 1 {
ARGV[ARGC] = $0
ARGC++
next
}
{
strings[$0]
cnt[ARGIND,$0]++
}
END {
for (string in strings) {
printf "%s%s", string, OFS
}
print "directory"
for (fileNr=2; fileNr<=ARGIND; fileNr++) {
for (string in strings) {
printf "%d%s", cnt[fileNr,string], OFS
}
print gensub(/(.*\/)?([^/]+)\/[^/]+$/,"\\2",1,ARGV[fileNr])
}
}
$ awk -f tst.awk targets.txt
was,rd.,Lol,ny.,LOL,Wei,hat,hah,lo.,fun,a t,Hel,directory
0,0,1,0,1,0,0,0,1,0,0,1,tallperson
0,1,0,0,2,1,0,0,0,0,0,0,tallperson
0,0,2,0,0,0,0,0,0,0,0,0,tallperson
1,0,0,1,2,0,1,1,0,1,1,0,shortperson