在文本文件中删除块内的欺骗是一种聪明而简单的方法。每个块由两个换行符分隔。
在:
apple
banana
apple
cherry
cherry
delta
epsilon
delta
epsilon
apple pie
delta
delta
在:
apple
banana
cherry
delta
epsilon
apple pie
delta
感谢。应该在Mac上工作。允许unicode。任何shell方法/语言/命令。 Dupes不一定是连续的。如果忽略前导/尾随空格,可以使用奖励,或者可以使用逗号作为记录中的分隔符。
答案 0 :(得分:4)
$ awk '!NF{delete seen} !seen[$0]++' file
apple
banana
cherry
delta
epsilon
apple pie
delta
忽略(与删除相反)使用GNU awk for gensub()的前导/尾随空格将是:
$ awk '!NF{delete seen} !seen[gensub(/^\s+|\s+$/,"","g")]++' file
在这种情况下,我不知道can use a comma as the delimiter within a record
你的意思。
答案 1 :(得分:0)
RUBY!
text =<<_
apple
banana
apple
cherry
cherry
delta
epsilon
delta
epsilon
apple pie
delta
delta
_
r1 = /
(?<=\n) # match a newline in a positive lookbehind
\n # match a newline
/x # extended/free-spacing regex definition mode
r2 = /
(?<=\n) # match a newline in a positive lookbehind
/x
puts text.split(r1).map { |s| s.split(r2).uniq.join }.join("\n")
# apple
# banana
# cherry
# delta
# epsilon
# apple pie
# delta
步骤:
a = text.split(r1)
#=> ["apple\nbanana\napple\ncherry\ncherry\n",
# "delta\nepsilon\ndelta\nepsilon\n",
# "apple pie\ndelta\ndelta\n"]
a.map { |s| s.split(r2) }
#=> [["apple\n", "banana\n", "apple\n", "cherry\n", "cherry\n"],
# ["delta\n", "epsilon\n", "delta\n", "epsilon\n"],
# ["apple pie\n", "delta\n", "delta\n"]]
a.map { |s| s.split(r2).uniq }
#=> [["apple\n", "banana\n", "cherry\n"],
# ["delta\n", "epsilon\n"],
# ["apple pie\n", "delta\n"]]
b = a.map { |s| s.split(r2).uniq.join }
#=> ["apple\nbanana\ncherry\n",
# "delta\nepsilon\n",
# "apple pie\ndelta\n"]
b.join("\n")
#=> "apple\nbanana\ncherry\n\ndelta\nepsilon\n\napple pie\ndelta\n"
答案 2 :(得分:0)
这可能适合你(GNU sed):
sed -r ':a;N;s/\b((\S+)\b.*)\n\2$/\1/;/^$/M!ba' file
将线条存储在图案空间(PS)中,直到空白行或文件末尾。读取最后一行和前一行的模式匹配,如果匹配,则删除最后一行。如果最后一行是空行(或文件末尾),则打印PS中保留的所有行。
答案 3 :(得分:0)
假设:
$ cat file
apple
banana
apple
cherry
cherry
delta
epsilon
delta
epsilon
apple pie
delta
delta
您可以使用Ruby的段落模式命令行开关将空行作为每条记录的分隔符,并将字段分隔符设置为每个字段的\n
。然后统一每个块:
$ ruby -00 -F'\n' -lane '$><<$F.uniq.join("\n")<<"\n\n"' file
apple
banana
cherry
delta
epsilon
apple pie
delta
说明:
$ ruby -00 -F'\n' -lane '$><<$F.uniq.join("\n")<<"\n\n"'
^ # ruby 1.9+ only I think
^ # split records by \n\n
^ # split fields by \n
^ # options to:
-l loop over input
a auto split
n don't auto print
e compile command line
^ # to STDOUT
^ # append
^ # the split fields
^ # made uniq
^ # join back to a string
^ # add back the record separator
或者,您可以使用Ruby哈希来计算字段,然后只打印哈希的键:
$ ruby -00 -F'\n' -lane 'h=Hash.new(0)
$F.each {|f| h[f]+=1 }
p h
puts h.keys.join("\n")<<"\n\n"
' file
{"apple"=>2, "banana"=>1, "cherry"=>2}
apple
banana
cherry
{"delta"=>2, "epsilon"=>2}
delta
epsilon
{"apple pie"=>1, "delta"=>2}
apple pie
delta
(在ruby 1.9+中,哈希值保持插入顺序 - 这将按文件顺序打印单词。)
然后,如果要向潜在字段分隔符添加,
,您可以执行以下操作:
$ ruby -00 -F'\n|,' -lane '$><<$F.uniq.join("\n")<<"\n\n"' file