我们假设我有这个样本
foo/bar/123-465.txt
foo/bar/456-781.txt
foo/bar/102-445.txt
foo/bar/123-721.txt
我想删除正则表达式/[0-9]*-
结果也出现在另一行上的每一行。换句话说:我想在我的文件中删除文件前缀不止一次出现的每一行。
因此只保留:
foo/bar/456-781.txt
foo/bar/102-445.txt
我打赌sed可以做到这一点,但是怎么样?
答案 0 :(得分:3)
好的,我误解了你的问题,这是怎么做的:
grep -vf <(grep -o '/[0-9]*-' file | sort | uniq -d) file
行动中:
cat file
foo/bar/123-465.txt
foo/bar/456-781.txt
foo/bar/102-445.txt
foo/bar/123-721.txt
grep -vf <(grep -o '/[0-9]*-' file | sort | uniq -d) file
foo/bar/456-781.txt
foo/bar/102-445.txt
答案 1 :(得分:1)
您可以使用以下awk
脚本:
example.awk:
{
# Get value of interest (before the -)
prefix=substr($3,0,match($3,/\-/)-1)
# Increment counter for this value (starting at 0)
counter[prefix]++
# Buffer the current line
buffer[prefix]=$0
}
# At the end print every line which's value of interest appeared just once
END {
for(index in counter)
if(counter[index]==1)
print buffer[index]
}
执行它:
awk -F\ -f example.awk input.file
答案 2 :(得分:1)
awk '
match($0, "[0-9]*-") {
id=substr($0, RSTART, RLENGTH)
if (store[id])
dup[id] = 1
store[id] = $0
}
END {
for(id in store) {
if(! dup[id]) {
print store[id]
}
}
}
'