如何列出每个单词一次?忽略非字母字符

时间:2017-03-16 01:41:35

标签: linux bash

示例的以下文字, Unix诞生于1969年,出自贝尔计算机科学家的脑海 实验室,肯汤普森。 Unix在一台被清除的PDP-7小型机上开始生活

 tr -cs "[:alpha:]" "\n" < file | sort -u 

结果似乎很好,但我发现了一个问题,原始文本中的“PDP-7”变成了“PDP”,我不确定是否应该添加更多的参数?

2 个答案:

答案 0 :(得分:1)

这似乎就是你要问的问题(不是那个问题很简单):

sed -r  's/[[:space:]]/\n/g' unix | sed -r 's/[^a-zA-Z0-9]//g' | sort -u
1960s
1962
1969
a
actual
almost
also
an
and
at
barely
batch
beasts
been
Bell
born
But
by
computer
computing
concept
deployment
earlier
else
everywhere
experience
experimental
first
for
had
him
in
inventor
it
John
Ken
Laboratories
language
late
Lisp
McCarthy
mind
Multics
novel
of
on
one
operating
out
primitive
project
researcher
rule
scientist
seven
speculations
spoiled
still
systems
temperamental
ten
that
the
Thompson
timesharing
Unix
uttered
was
were
which
years

答案 1 :(得分:1)

请记住在使用uniq -u之前使用sort,例如:

sort | uniq -u

如果你只想保留字母,你也可以输入以下内容:

sed "s/[[:digit:].-]//g"

希望有所帮助。有一个代码/列表的例子会很棒。