Question

我有一个包含以下内容的输入文件：

123,apple,orange
123,pineapple,strawberry
543,grapes,orange
790,strawberry,apple
870,peach,grape
543,almond,tomato
123,orange,apple

我希望输出为：重复以下数字： 123 543

有没有办法使用awk获取此输出;我在solaris中编写脚本，bash

Answer 1

sed -e 's/,/ , /g' <filename> | awk '{print $1}' | sort | uniq -d

Answer 2

如果你可以没有awk，你可以使用它来获得重复的数字：

cut -d, -f 1 my_file.txt  | sort | uniq -d

打印

123
543

修改（以回应您的评论）

您可以缓冲输出并决定是否要继续。例如：

out=$(cut -d, -f 1 a.txt | sort | uniq -d | tr '\n' ' ')
if [[ -n $out ]] ; then
    echo "The following numbers are repeated: $out"
    exit
fi

# continue...

Answer 3

此脚本将仅打印重复多次的第一列的编号：

awk -F, '{a[$1]++}END{printf "The following numbers are repeated: ";for (i in a) if (a[i]>1) printf "%s ",i; print ""}' file

或者缩短形式：

awk -F, 'BEGIN{printf "Repeated "}(a[$1]++ == 1){printf "%s ", $1}END{print ""} ' file

如果要在找到dup的情况下退出脚本，则可以退出非零退出代码。例如：

awk -F, 'a[$1]++==1{dup=1}END{if (dup) {printf "The following numbers are repeated: ";for (i in a) if (a[i]>1) printf "%s ",i; print "";exit(1)}}' file

在您的主脚本中，您可以：

awk -F, 'a[$1]++==1{dup=1}END{if (dup) {printf "The following numbers are repeated: ";for (i in a) if (a[i]>1) printf "%s ",i; print "";exit(-1)}}' file || exit -1

或者以更易读的格式：

awk -F, '
    a[$1]++==1{
        dup=1
    }
    END{
        if (dup) {
            printf "The following numbers are repeated: ";
            for (i in a) 
                if (a[i]>1) 
                    printf "%s ",i; 
            print "";
            exit(-1)
        }
    }
' file || exit -1

Answer 4

awk -vFS=',' \
     '{KEY=$1;if (KEY in KEYS) { DUPS[KEY]; }; KEYS[KEY]; }   \
      END{print "Repeated Keys:"; for (i in DUPS){print i} }' \
< yourfile

还有sort / uniq / cut的解决方案（见上文）。

如何只在文件中打印一次重复项？

4 个答案: