我还在习惯Python!我只需要一点帮助:在我的程序中有两个csv文件,一个名为“testclaims”,另一个名为“notinlist”。对于writer3,我让程序在自己的行中打印新csv中每行的每个单词。例如,如果testclaims中的行显示
The boy fell and the boy got hurt
输出:
The
boy
fell
and
the
boy
got
hurt
但是,如果它在同一行重复,我不希望它再次打印一个单词。我希望输出说:
The
boy
fell
and
the
got
hurt
我一直试图这样做,现在围绕计数器和频率玩,但无法弄明白。如果你们能帮助我,那就太好了!这是我的代码:
import csv
with open("testclaims.csv") as file1, open("masterlist.csv") as file2,
open("stopwords.csv") as file3,\
open("output.csv", "wb+") as file4, open("output2.csv", "wb+") as file5:
writer = csv.writer(file4)
writer2 = csv.writer(file5)
key_words = [word.strip() for word in file2.readlines()]
stop_words = [word.strip() for word in file3.readlines()]
internal_stop_words = [' a ', ' an ', ' and ', 'as ', ' at ', ' be ', 'ed ',
'ers ', ' for ',\
' he ', ' if ', ' in ', ' is ', ' it ', ' of ', ' on ', ' to ', 'her ', 'hers '\
' do ', ' did ', ' a ', ' b ', ' c ', ' d ', ' e ', ' f ', ' g ', ' h ', ' i ',\
' j ', ' k ', ' l ', ' m ', 'n ', ' n', ' nc ' ' o ', ' p ', ' q ', ' r ', ' s ',\
' t ', ' u ', ' v ', ' w ', ' x ', ' y ', 'z ', ',', '"', 'ers ', ' th ', ' gc ',\
' so ', ' ot ', ' ft ', ' ow ', ' ir ', ' ho ', ' er ', ]
for row in file1:
row = row.strip()
row = row.lower()
for stopword in internal_stop_words:
if stopword in row:
row = row.replace(stopword," ")
for key in key_words:
if key in row:
writer.writerow([key, row])
for word in row.split(): #This Part Here!
writer3.writerow([word])
if not any(key in row for key in key_words):
writer2.writerow([row])
答案 0 :(得分:1)
使用OrderedDict来简单的事情......
>>> import collections
>>> print "\n".join(collections.OrderedDict.fromkeys("The boy fell and the boy got hurt".split()).keys())
The
boy
fell
and
the
got
hurt
答案 1 :(得分:1)
使用set()
row = 'The boy fell and the boy got hurt'
s = set()
for word in row.split():
if word not in s:
s.add(word)
#print word
writer3.writerow([word])