我是编程的初学者,但对于自然语言处理项目,我需要使用csv。
我有这个带注释文本的csv文件。句子用空行分开。每一行都是一个令牌(带有它的单词或标点符号的注释)。我需要的是一个嵌套列表,如[[[I,pronoun],[need, verb], [you, pronoun]], [[Do, verb], [you, pronoun], [need, verb], [me, pronoun]]]
csv中的文字如下所示:
I pronoun
need verb
you pronoun
Do pronoun
you pronoun
need verb
me pronoun
我尝试了以下代码,但后来我只得到一个大列表,而不是嵌套列表。我不知道如何将句子分成空行的不同列表。
sentences = []
for row in text:
sentences.append(list(row))
print(sentences)
有什么建议吗?
答案 0 :(得分:2)
您可以执行类似
的操作sentences = []
with open('my_file.csv', 'r') as R:
curr = [] # store current sentence
for row in R:
if len(row) == 0:
# empty line
sentences.append(curr) # add current sentence to pool
curr = [] # start a new sentence
continue
curr.append(row.strip().split()) # assuming no leading 1. etc.