如何逐行读取CSV文件并每次将其存储到新行的新CSV文件中?

时间:2016-06-08 15:43:12

标签: python csv nltk

我是Python新手。我正在尝试读取CSV文件,在从文件中删除停用词后,我必须将其存储到新的CSV文件中。我的代码是删除停用词,但它将第一行复制到单行文件的每一行。 (例如,如果文件中有三行,则它会在第一行中复制第一行三次。)

正如我分析的那样,我认为问题出现在循环中,但我没有得到它。我的代码附在下面。

代码:

import nltk
import csv
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize

def stop_Words(fileName,fileName_out):
    file_out=open(fileName_out,'w')
    with open(fileName,'r') as myfile:
         line=myfile.readline()
         stop_words=set(stopwords.words("english"))
         words=word_tokenize(line)
         filtered_sentence=[" "]
         for w in myfile:
            for n in words:
               if n not in stop_words:
                 filtered_sentence.append(' '+n)
         file_out.writelines(filtered_sentence)
   print "All Done SW"

stop_Words("A_Nehra_updated.csv","A_Nehra_final.csv")
print "all done :)"

1 个答案:

答案 0 :(得分:2)

您只是阅读文件的第一行:line=myfile.readline()。您想迭代文件中的每一行。一种方法是

with open(fileName,'r') as myfile:
    for line in myfile:
        # the rest of your code here, i.e.:
        stop_words=set(stopwords.words("english"))
        words=word_tokenize(line)

另外,你有这个循环

for w in myfile:
    for n in words:
        if n not in stop_words:
            filtered_sentence.append(' '+n)

但是你会注意到最外层循环中定义的w永远不会在循环中使用。您应该能够删除它并只写

for n in words:
    if n not in stop_words:
        filtered_sentence.append(' '+n)

编辑:

import nltk
import csv
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize

def stop_Words(fileName,fileName_out):
    file_out=open(fileName_out,'w')
    with open(fileName,'r') as myfile:
        for line in myfile:
            stop_words=set(stopwords.words("english"))
            words=word_tokenize(line)
            filtered_sentence=[""]
            for n in words:
                if n not in stop_words:
                    filtered_sentence.append(""+n)
            file_out.writelines(filtered_sentence+["\n"])
    print "All Done SW"