如何只在文件中打印一次重复项?

时间:2013-08-17 16:40:07

标签: bash awk solaris

我有一个包含以下内容的输入文件:

123,apple,orange
123,pineapple,strawberry
543,grapes,orange
790,strawberry,apple
870,peach,grape
543,almond,tomato
123,orange,apple

我希望输出为:     重复以下数字:     123     543

有没有办法使用awk获取此输出;我在solaris中编写脚本,bash

4 个答案:

答案 0 :(得分:2)

sed -e 's/,/ , /g' <filename> | awk '{print $1}' | sort | uniq -d

答案 1 :(得分:1)

如果你可以没有awk,你可以使用它来获得重复的数字:

cut -d, -f 1 my_file.txt  | sort | uniq -d

打印

123
543

修改(以回应您的评论)

您可以缓冲输出并决定是否要继续。例如:

out=$(cut -d, -f 1 a.txt | sort | uniq -d | tr '\n' ' ')
if [[ -n $out ]] ; then
    echo "The following numbers are repeated: $out"
    exit
fi

# continue...

答案 2 :(得分:1)

此脚本将仅打印重复多次的第一列的编号:

awk -F, '{a[$1]++}END{printf "The following numbers are repeated: ";for (i in a) if (a[i]>1) printf "%s ",i; print ""}' file

或者缩短形式:

awk -F, 'BEGIN{printf "Repeated "}(a[$1]++ == 1){printf "%s ", $1}END{print ""} ' file

如果要在找到dup的情况下退出脚本,则可以退出非零退出代码。例如:

awk -F, 'a[$1]++==1{dup=1}END{if (dup) {printf "The following numbers are repeated: ";for (i in a) if (a[i]>1) printf "%s ",i; print "";exit(1)}}' file

在您的主脚本中,您可以:

awk -F, 'a[$1]++==1{dup=1}END{if (dup) {printf "The following numbers are repeated: ";for (i in a) if (a[i]>1) printf "%s ",i; print "";exit(-1)}}' file || exit -1

或者以更易读的格式:

awk -F, '
    a[$1]++==1{
        dup=1
    }
    END{
        if (dup) {
            printf "The following numbers are repeated: ";
            for (i in a) 
                if (a[i]>1) 
                    printf "%s ",i; 
            print "";
            exit(-1)
        }
    }
' file || exit -1

答案 3 :(得分:1)

awk -vFS=',' \
     '{KEY=$1;if (KEY in KEYS) { DUPS[KEY]; }; KEYS[KEY]; }   \
      END{print "Repeated Keys:"; for (i in DUPS){print i} }' \
< yourfile

还有sort / uniq / cut的解决方案(见上文)。