Question

我在使用自然语言工具包执行命名实体识别时遇到问题。在我执行NER之前，我必须进行句子分段，标记化和POS标记。我已使用以下代码完成此操作：

def prepfunc(doc):
    segsents = nltk.sent_tokenize(doc)
    toksents = [nltk.word_tokenize(sent) for sent in segsents]
    possents = [nltk.pos_tag(sent) for sent in toksents]
    return possents

prepfunc(doc)

我需要将此输出放在一行：

[[('word1', 'tag'), ('word2', 'tag'), ('word3', 'tag'), ...]...]

相反，我得到的是新行上的每个单词：

[[('word1', 'tag'),
('word2', 'tag'),
('word3',
'tag'),
...]
...]

我可能会忽略一些简单的原则，但我无法弄清楚如何删除列表项之间的新行。

我已经找到了解决这个问题的方法，但是，我发现的大多数解决方案都是从列表中的字符串中删除新行。我需要从列表中删除换行符。

编辑：

打印输出的代码是：

prepfunc(doclist[0])

我打开了这样的文件：

f='myfile.txt'
opf=open(f, encoding="UTF-8")
doclist=opf.read().split('\n')

我需要以这种方式打开文件。

如何摆脱列表中项目之间的新行？

0 个答案: