如何使用Unix分离令牌?

时间:2014-02-14 11:48:05

标签: file unix tokenize

如何使用Unix分隔标记?

[IN]:

some sentences are like this.
some sentences foo bar that

[出来:]

some
sentences
are
like
this.

some
sentences
foo
bar
that

我可以在python中完成此操作,如下所示,但是否有任何unix方法可以实现相同的输出?

>>> import codecs
>>> outfile = codecs.open('outfile.txt','w','utf8')
>>> intext = "some sentences are like this.\n some sentences foo bar that"
>>> infile = codecs.open('infile.txt','w','utf8')
>>> print>>infile, intext
>>> for i in codecs.open('infile.txt','r','utf8'):
...     for j in i.split():
...             print>>outfile, j
...     print>>outfile
... 
>>> exit()

alvas@ubi:~$ cat outfile.txt 
some
sentences
are
like
this.

some
sentences
foo
bar
that

3 个答案:

答案 0 :(得分:2)

使用sed

$ cat infile.txt
some sentences are like this.
some sentences foo bar that
$ sed 's/\s\+\|$/\n/g' infile.txt > outfile.txt
$ cat outfile.txt
some
sentences
are
like
this.

some
sentences
foo
bar
that

答案 1 :(得分:1)

使用xargs

xargs -n1 < file

答案 2 :(得分:0)

sed -e 's/ \|$/\n/g' < text

应该做什么?