我有一个包含以下内容的文件:
VoicemailButtonTest
VoicemailButtonTest
VoicemailButtonTest
VoicemailButtonTest
VoicemailButtonTest
VoiceMailConfig60CharsTest
VoicemailDefaultTypeTest
VoiceMailIconSelectableTest
VoiceMailIconSelectableTest
VoiceMailIconSelectableTest
VoiceMailIconSelectableTest
VoiceMailIconSelectableTest
VoicemailSettingsFromMessageModeScreenTest
VoicemailSettingsFromMessageModeScreenTest
VoicemailSettingsTest
VoicemailSettingsTest
VoicemailSettingsTest
VoicemailSettingsTest
VoicemailSettingsTest
VoicemailSettingsTest
VoicemailSettingsTest
如何使用计数替换重复的行:
VoicemailButtonTest (5)
VoiceMailConfig60CharsTest (1)
VoicemailDefaultTypeTest (1)
VoiceMailIconSelectableTest (5)
VoicemailSettingsFromMessageModeScreenTest (2)
VoicemailSettingsTest (7)
我将这对放入一个关联数组。我尝试在'while'语句中使用'read',但数组丢失了。这是我的尝试:
unset line
tests=$(cat file.log)
echo "$tests" |
while read l; do
if [ "$l" == "${line}" ]; then
let cnt++;
else
echo "${line} (${cnt})"
line=${l}
cnt=1
fi
export run_suites
done
答案 0 :(得分:2)
您可以使用这个简单的awk脚本来获取计数:
awk '{freq[$1]++} END{for (i in freq) print i, "(" freq[i] ")"}' file
VoiceMailConfig60CharsTest (1)
VoicemailSettingsFromMessageModeScreenTest (2)
VoiceMailIconSelectableTest (5)
VoicemailButtonTest (5)
VoicemailDefaultTypeTest (1)
VoicemailSettingsTest (7)
如果您想在输入中保持外观顺序,请使用:
awk '!freq[$1]++{order[++k]=$1} END{
for (i=1; i<=k; i++) print order[i], "(" freq[order[i]] ")"}' file
VoicemailButtonTest (5)
VoiceMailConfig60CharsTest (1)
VoicemailDefaultTypeTest (1)
VoiceMailIconSelectableTest (5)
VoicemailSettingsFromMessageModeScreenTest (2)
VoicemailSettingsTest (7)
答案 1 :(得分:2)
假设输出的格式不必与
匹配VoicemailButtonTest (5)
VoiceMailConfig60CharsTest (1)
VoicemailDefaultTypeTest (1)
VoiceMailIconSelectableTest (5)
VoicemailSettingsFromMessageModeScreenTest (2)
VoicemailSettingsTest (7)
你可以使用
sort <input_file> | uniq -c
如果您需要输出与您发布的内容完全匹配,可以使用
awk '{duplicates[$1]++} END{for (ind in duplicates) {print ind,"("duplicates[ind]")"}}' <input_file>
编辑:在anubhava的回答之后发布...但是因为添加了sort命令而离开(除非人们建议我删除)。
答案 2 :(得分:2)
如果您不关心确切的输出格式,请使用sort
和uniq
:
$ sort file.log | uniq -c
5 VoicemailButtonTest
1 VoiceMailConfig60CharsTest
1 VoicemailDefaultTypeTest
5 VoiceMailIconSelectableTest
2 VoicemailSettingsFromMessageModeScreenTest
7 VoicemailSettingsTest
当然,如果文件已按您的问题排序,则 sort
是不必要的。如果它没有排序,uniq -c
仍然有效,但如果它与前一行相同,它只会认为一行是重复的:
$ printf 'a\nb\na' | uniq -c
1 a
1 b
1 a
答案 3 :(得分:1)
$ awk '$1 != prev{if (NR>1) print prev, "("cnt")"; prev=$1; cnt=0} {cnt++} END{print prev, "("cnt")"}' file
VoicemailButtonTest (5)
VoiceMailConfig60CharsTest (1)
VoicemailDefaultTypeTest (1)
VoiceMailIconSelectableTest (5)
VoicemailSettingsFromMessageModeScreenTest (2)
VoicemailSettingsTest (7)
上面保留了您的输入订单并且几乎没有存储在内存中,它不关心您的输入是否排序,它只依赖于输入文件中连续出现的所有重复键,就像您在示例中所示。< / p>
答案 4 :(得分:0)
没有awk
根据首次出现保持键的顺序,不需要排序或分组输入。
cat -n file | # add line numbers for order
sort -k2 | # sort based on keys, ignoring line no
uniq -f1 -c | # count keys, ignoring line no
sort -k2,2n | # sort by line no to recover initial order
sed -r 's/(\S+)\s+(\S+)\s+(\S+)/\3 (\1)/' # format output
答案 5 :(得分:0)
使用bash数组
unset tab
declare -A tab
while read line;do
let tab["$line"]=${tab["$line"]}+1
done < infile
for i in ${!tab[*]} ;do
echo "$i (${tab[$i]})"
done | sort