如何解析换行符分隔的文本

时间:2014-04-15 12:36:11

标签: python nlp

如何解析由换行符分隔的标记,如下所示:

Wolff PERSON
is O
in O    
Argentina LOCATION

The O
US LOCATION
Envoy O 
noted O

使用python进入这样的完整句子?

Wolff is in Argentina
The US Envoy noted

1 个答案:

答案 0 :(得分:1)

您可以使用itertools.groupby

>>> from StringIO import StringIO
>>> from itertools import groupby
>>> s = '''Wolff PERSON
is O
in O    
Argentina LOCATION

The O
US LOCATION
Envoy O 
noted O'''
>>> c = StringIO(s)
>>> for k, g in groupby(c, key=str.isspace):
    if not k:
        print ' '.join(x.split(None, 1)[0] for x in g)
...         
Wolff is in Argentina
The US Envoy noted

如果输入实际上来自字符串而不是文件,那么:

for k, g in groupby(s.splitlines(), key= lambda x: not x.strip()):
    if not k:
        print ' '.join(x.split(None, 1)[0] for x in g)
...         
Wolff is in Argentina
The US Envoy noted