Sed:删除模式结果也出现在别处的行

时间:2014-07-07 13:48:07

标签: regex bash sed

我们假设我有这个样本

 foo/bar/123-465.txt
 foo/bar/456-781.txt
 foo/bar/102-445.txt
 foo/bar/123-721.txt

我想删除正则表达式/[0-9]*-结果也出现在另一行上的每一行。换句话说:我想在我的文件中删除文件前缀不止一次出现的每一行。

因此只保留:

 foo/bar/456-781.txt
 foo/bar/102-445.txt

我打赌sed可以做到这一点,但是怎么样?

3 个答案:

答案 0 :(得分:3)

好的,我误解了你的问题,这是怎么做的:

grep -vf <(grep -o '/[0-9]*-' file | sort | uniq -d) file

行动中:

 cat file
 foo/bar/123-465.txt
 foo/bar/456-781.txt
 foo/bar/102-445.txt
 foo/bar/123-721.txt

grep -vf <(grep -o '/[0-9]*-' file | sort | uniq -d) file
 foo/bar/456-781.txt
 foo/bar/102-445.txt

答案 1 :(得分:1)

您可以使用以下awk脚本:

example.awk:

{
  # Get value of interest (before the -)
  prefix=substr($3,0,match($3,/\-/)-1)

  # Increment counter for this value (starting at 0) 
  counter[prefix]++

  # Buffer the current line
  buffer[prefix]=$0
}

# At the end print every line which's value of interest appeared just once
END {
  for(index in counter)
    if(counter[index]==1)
      print buffer[index]
}

执行它:

awk -F\ -f example.awk input.file

答案 2 :(得分:1)

awk '
 match($0, "[0-9]*-") {
    id=substr($0, RSTART, RLENGTH)
    if (store[id])
       dup[id] = 1
    store[id] = $0
 }
 END {
    for(id in store) {
      if(! dup[id]) {
        print store[id]
      }
    }
 }
'