我有以下文件:
epd cvrA
epd cvrA
cvrA epd
emrY hofB
mdtI ydeP
ygcH yagR
nrdD abrB
lsrK yqgD
yhdA yiaF
fadJ plsB
fadJ thiG
plsB thiG
ybhS glnE
yfeX idnR
我想确定两列值的组合的出现次数。所以A和B以及B和A应该算是一样的。我尝试了一下uniq -c,但是没有解决它。所以最后我希望有一个文件,每个可能的组合 - 再次A和B,B和A应该总结在一起。这有可能用awk吗?
输出意味着:
epd cvrA 3
emrY hofB 1
mdtI ydeP 1
ygcH yagR 1
nrdD abrB 1
lsrK yqgD 1
yhdA yiaF 1
fadJ plsB 1
fadJ thiG 1
plsB thiG 1
ybhS glnE 1
yfeX idnR 1
答案 0 :(得分:4)
$ awk '{cnt[($1>$2 ? $1 FS $2 : $2 FS $1)]++}
END{for (idx in cnt) print idx, cnt[idx]}' file
ygcH yagR 1
thiG plsB 1
hofB emrY 1
plsB fadJ 1
yqgD lsrK 1
ydeP mdtI 1
yfeX idnR 1
ybhS glnE 1
thiG fadJ 1
nrdD abrB 1
epd cvrA 3
yiaF yhdA 1
如果您希望输出按count排序,则使用GNU awk for sorted_in:
$ awk '{cnt[($1>$2 ? $1 FS $2 : $2 FS $1)]++}
END{PROCINFO["sorted_in"]="@val_num_desc"; for (idx in cnt) print idx, cnt[idx]}' file
epd cvrA 3
thiG plsB 1
hofB emrY 1
plsB fadJ 1
yqgD lsrK 1
ydeP mdtI 1
yfeX idnR 1
ybhS glnE 1
thiG fadJ 1
nrdD abrB 1
ygcH yagR 1
yiaF yhdA 1
答案 1 :(得分:3)
喜欢这个?请发布预期的输出和工作数据集。
$ cat > bar
a b
b a
$ awk '{if($1<$2) a[$1 " " $2]++; else a[$2 " " $1]++} END {for(i in a) print i, a[i]}' bar
a b 2
答案 2 :(得分:1)
不使用awk的解决方案(仅用于娱乐),
while IFS= read -r aline; do
echo "$aline" | tr " " "\n" | sort -r | tr "\n" " " ; echo "";
done < input | uniq -c
你明白了,
3 epd cvrA
1 hofB emrY
1 ydeP mdtI
1 ygcH yagR
1 nrdD abrB
1 yqgD lsrK
1 yiaF yhdA
1 plsB fadJ
1 thiG fadJ
1 thiG plsB
1 ybhS glnE
1 yfeX idnR
答案 3 :(得分:1)
此答案维护字段排序,以及行顺序:
awk '
$1 FS $2 in count {count[$1 FS $2]++; next}
$2 FS $1 in count {count[$2 FS $1]++; next}
{
count[$1 FS $2] = 1
line[NR] = $1 FS $2
}
END {
for (i=1; i<=NR; i++)
if (i in line)
print line[i], count[l[i]]
}
' file
输出
epd cvrA 3
emrY hofB 1
mdtI ydeP 1
ygcH yagR 1
nrdD abrB 1
lsrK yqgD 1
yhdA yiaF 1
fadJ plsB 1
fadJ thiG 1
plsB thiG 1
ybhS glnE 1
yfeX idnR 1