我有一个格式为的文件:
id-of-item
description of item
id-of-item
description of item
id-of-item
description of item
id-of-item
description of item
id-of-item
description of item
(每个之间只有一条线,这里只是大空间)
我需要比较项目的描述以及它们是否匹配,删除该描述但保留id(我需要创建一个将id引用为组的表)
我不知道如何做到这一点,我尝试了几个与NR%2和uniq等的awk,但显然所有只匹配一个而不是其他= /
答案 0 :(得分:2)
这可能很接近。 awk的规则是, 把你想要杀死重复的东西放到数组的索引中:
BEGIN {title = ""}
NF == 0 { print; next;}
title == "" {
title = $0;
print; next;
}
{
if (value[$0] == "" ) print;
value[$0] = $0;
title = ""
}
感受关联数组的力量。
答案 1 :(得分:0)
这可能对你有帮助(?):
# cat input.txt
id-of-item0
id-of-item0 description of item0
id-of-item1
id-of-item1 description of item1
id-of-item0
id-of-item0 description of item0
id-of-item3
id-of-item3 description of item3
id-of-item4
id-of-item4 description of item4
# sed 'N;s/\n/!!!/' input.txt | sort -u | sed 's/!!!/\n/'
id-of-item0
id-of-item0 description of item0
id-of-item1
id-of-item1 description of item1
id-of-item3
id-of-item3 description of item3
id-of-item4
id-of-item4 description of item4
如果您要删除说明:
# sed 'N;s/\n/!!!/' input.txt | sort -u | sed 's/!!!.*//'
id-of-item0
id-of-item1
id-of-item3
id-of-item4
说明:
一次阅读input.txt
2行,用分隔符替换换行符\n
(此处为!!!
)。排序和删除重复项。用换行符!!!
替换分隔符\n
。或者完全删除说明。
编辑:
这可能对您有用(?):
sed '/^$/d' input_file | # remove empty lines
sed -n 'h;n;G;s/\n/\t/p' | # join id with description and swap tab separating
sort | # sort descriptions
sed ':a;N;s/^\(\([^\t]*\)\t[^\n]*\)\n\2/\1/;ta;P;D' | # build index tab separated
sed 's/\t/\n/g' # translate tabs to newlines
答案 2 :(得分:0)
我将做两个简化的假设:
这两种假设都不是很强,所以如果需要,不应该很难适应以下内容。
根据这些假设,我将使用printf "1\n\nitem 1\n\n2\n\nitem 2\n\n3\n\nitem 2\n\n4\n\nitem 1\n"
生成样本数据。它看起来像这样:
1
item 1
2
item 2
3
item 2
4
item 1
要处理这些数据,我会:
这是一个管道:
grep -v '^[[:space:]]*$' |
awk 'NR%2 { printf("%s\t", $0) } !(NR%2)' |
sort -k2 |
awk -F"\t" 'desc != $2 { printf("-----\n%s\n", $2); desc = $2} { print $1 }'
通过它传输样本数据,然后得到
-----
item 1
1
4
-----
item 2
2
3
答案 3 :(得分:0)
这会有用吗?
awk 'NF' file | sed '{N;s/\n/:/g}' |
awk -F":" -v OFS="\n\n" -v ORS="\n\n" '{b[$2]++} {if (b[$2]>1) print $1; else print $1,$2}'
您的文件:
[jaypal:~/Temp] cat file
id-of-item31
description of item4 <--- Duplicate description
id-of-item22
description of item4 <--- Duplicate description
id-of-item34
description of item1 <--- Duplicate description
id-of-item21
description of item3
id-of-item11
description of item1 <--- Duplicate description
<强>执行:强>
[jaypal:~/Temp] awk 'NF' file | sed '{N;s/\n/:/g}' |
awk -F":" -v OFS="\n\n" -v ORS="\n\n" '{b[$2]++} {if (b[$2]>1) print $1; else print $1,$2}'
id-of-item31
description of item4
id-of-item22
id-of-item34
description of item1
id-of-item21
description of item3
id-of-item11