我有一个包含四列数据的文件,如下所示:
cluster-9 cluster-12 cluster-40 cluster-62
cluster-10 cluster-12 cluster-42 cluster-60
cluster-12 cluster-12 cluster-43 cluster-61
cluster-12 cluster-12 cluster-28 cluster-20
cluster-12 cluster-12 cluster-29 cluster-21
cluster-16 cluster-12 cluster-41 cluster-63
cluster-16 cluster-12 cluster-2 cluster-4
cluster-16 cluster-12 cluster-8 cluster-5
cluster-16 cluster-9 cluster-9 cluster-6
cluster-16 cluster-12 cluster-45 cluster-39
我想提取列1中的唯一值,而不提取特定的其他列中的唯一值(成对)。因此,例如,我希望能够比较第1列和第2列,并输出第1列中只有以下内容,而第2列中没有:
cluster-10
cluster-16
因为在第2列中找到了cluster-12和cluster-9,所以它们未打印。
答案 0 :(得分:4)
请您尝试以下。
awk '{a[$1];b[$2]} END{for(i in a){if(i in b){continue};print i}}' Input_file
cluster-10
cluster-16
假设我们要发送要在变量(awk
变量)中进行比较的列的值,然后尝试执行以下操作。
awk -v col1="1" -v col2="2" '{a[$col1];b[$col2]} END{for(i in a){if(i in b){continue};print i}}' Input_file
cluster-10
cluster-16
根据要比较的新列值更改变量-v col1
和-v col2
的值,然后它将比较它们的值(检查以获取一列唯一值,再查看另一列)。
答案 1 :(得分:0)
当然有多种方法可以完成此操作,但这是使用sed
,sort
和uniq
的方法。此处的关键是找到您关心的两列中每一列的唯一集合,然后对-u
使用uniq
选项以仅打印第一组中的项目。下面的代码查看第1列和第2列,但您可以轻松调整以查看其他任何一对列。
#!/bin/sh
#define a separator character and a column format, adjust to fit your data
sep=" "
col="\([a-zA-Z0-9_-]*\)$sep"
#get all values in column 1 and reduce to a unique set
col1=`sed "s/^$col.*/\\1/" file | sort | uniq`
#get all values in column2 and reduce to a unique set. Adjust for a different
#column as necessary
col2=`sed "s/^$col$col.*/\\2/" file | sort | uniq`
#concatenate our results and spit out only unique items.
#Include column 2 twice so that we don't get any items only in column2
echo "$col1$col2$col2" | sort | uniq -u
答案 2 :(得分:0)
您也可以尝试Perl
$ perl -lane ' $kv{$F[0]}++; $kv2{$F[1]}++; END { for(keys %kv) { unless ($kv2{$_}) { print "$_" } }}' greg.txt
cluster-10
cluster-16
$ cat greg.txt
cluster-9 cluster-12 cluster-40 cluster-62
cluster-10 cluster-12 cluster-42 cluster-60
cluster-12 cluster-12 cluster-43 cluster-61
cluster-12 cluster-12 cluster-28 cluster-20
cluster-12 cluster-12 cluster-29 cluster-21
cluster-16 cluster-12 cluster-41 cluster-63
cluster-16 cluster-12 cluster-2 cluster-4
cluster-16 cluster-12 cluster-8 cluster-5
cluster-16 cluster-9 cluster-9 cluster-6
cluster-16 cluster-12 cluster-45 cluster-39
$
或
$ perl -lane ' $kv{$F[0]}++; $kv2{$F[1]}++; END { for(keys %kv) { print unless $kv2{$_} }} ' greg.txt
cluster-10
cluster-16
$