我正在尝试分析文本以查找所有的'NN'和'nnp',到目前为止,代码运行良好,但是当我将输出保存到CSV文件时,我无法获取格式我想要。其中包含-已分析的单词,标签,问题-
这是代码:
training_set = []
text = 'I want to analized this text'
tokenized = nltk.word_tokenize(text)
tagged = nltk.pos_tag(tokenized)
result= [(word, tag) for word, tag in tagged if tag in ('NN', 'NNP')]
for i in result:
training_set.append(i)
training_set.append([text])
print(training_set)
listFile2 = open('sample.csv', 'w', newline='')
writer2 = csv.writer(listFile2,quoting=csv.QUOTE_ALL, lineterminator='\n', delimiter=',')
for item in training_set:
writer2.writerow(item)
结果如下:
任何想法如何将所有信息保留在同一行中。像这样:
我更改了代码并使用了两个列表,然后使用Zip将它们都添加到CSV文件中,但这似乎可行,但是都在“”和()中关闭了
training_set = []
question = []
text = 'I want to analyzed this text'
tokenized = nltk.word_tokenize(text)
tagged = nltk.pos_tag(tokenized)
result= [(word, tag) for word, tag in tagged if tag in ('NN', 'NNP')]
for i in result:
training_set.append(i)
question.append([text])
listFile2 = open('sample.csv', 'w', newline='')
writer2 = csv.writer(listFile2,quoting=csv.QUOTE_ALL, lineterminator='\n', delimiter=',')
for item in zip(training_set, question):
writer2.writerow(item)
结果:
答案 0 :(得分:1)
在将其写入csv之前,您可以尝试执行以下操作以所需的格式获取数据:
X
输出:
[tag + (text,) for tag in result]
从本质上讲,它将以所需的格式为您提供元组列表,然后您可以将其写入到csv中。