我有这个程序需要考虑两个csv文件。它着眼于" testclaims" (一列多行)并查看" masterlist"(一列,多行)中的任何单词是否在" testclaims的行内。"如果" testclaims"中的行包含"主列表中的任何单词"它会将其列入名为"输出的新.csv文件中。"这部分程序运作良好。
我似乎无法弄清楚的部分是输出" testclaims"中的所有剩余行。不包含"主列表中的任何单词"进入另一个名为" output2"的csv我认为我的代码的最后两行应该让它工作,但它没有输出我想要的。我希望我已经清楚地解释了这一点。这是我的代码:
import csv
with open("testclaims.csv") as file1, open("masterlist.csv") as file2,
open("stopwords.csv") as file3,\
open("output.csv", "wb+") as file4, open("output2.csv", "wb+") as file5:
writer = csv.writer(file4)
writer2 = csv.writer(file5)
key_words = [word.strip() for word in file2.readlines()]
stop_words = [word.strip() for word in file3.readlines()]
internal_stop_words = [' a ', ' an ', ' and ', 'as ', ' at ', ' be ', 'ed ',
'ers ', ' for ',\
' he ', ' if ', ' in ', ' is ', ' it ', ' of ', ' on ', ' to ', 'her ', 'hers '\
' do ', ' did ', ' a ', ' b ', ' c ', ' d ', ' e ', ' f ', ' g ', ' h ', ' i ',\
' j ', ' k ', ' l ', ' m ', 'n ', ' n', ' nc ' ' o ', ' p ', ' q ', ' r ', ' s ',\
' t ', ' u ', ' v ', ' w ', ' x ', ' y ', 'z ', ',', '"', 'ers ', ' th ', ' gc ',\
' so ', ' ot ', ' ft ', ' ow ', ' ir ', ' ho ', ' er ', ]
for row in file1:
row = row.strip()
row = row.lower()
for stop in stop_words:
if stop in row:
row = row.replace(stop," ")
for stopword in internal_stop_words:
if stopword in row:
row = row.replace(stopword," ")
for key in key_words:
if key in row:
writer.writerow([key, row])
elif key not in row:
writer2.writerow([row])
输出的输出2是" testclaims"中的每一行。多次重复。
例如,如果" testclaims"包含这一列:
Happy
Sad
Angry
Dog
Cat
"输出2"正在输出打印这一列的csv:
Happy
Happy
Happy
Happy
Happy
Sad
Sad
Sad
Sad
Angry
Angry
Angry
Angry
Angry
Dog
Dog
Dog
Dog
Dog
Cat
Cat
Cat
Cat
Cat
它也不会输出相同数量的每一行。
答案 0 :(得分:1)
你有一个双循环和每次打印行,但每行最多只需要一次。 你应该调整你的最后两行:
for row in file1:
...
for key in key_words:
if key in row:
writer.writerow([key, row])
if not any(key in row for key in key_words):
writer2.writerow([row])