比较Python中的两个csv文件

时间:2014-07-16 15:55:43

标签: python csv

我有这个程序需要考虑两个csv文件。它着眼于" testclaims" (一列多行)并查看" masterlist"(一列,多行)中的任何单词是否在" testclaims的行内。"如果" testclaims"中的行包含"主列表中的任何单词"它会将其列入名为"输出的新.csv文件中。"这部分程序运作良好。

我似乎无法弄清楚的部分是输出" testclaims"中的所有剩余行。不包含"主列表中的任何单词"进入另一个名为" output2"的csv我认为我的代码的最后两行应该让它工作,但它没有输出我想要的。我希望我已经清楚地解释了这一点。这是我的代码:

    import csv

    with open("testclaims.csv") as file1, open("masterlist.csv") as file2,
        open("stopwords.csv") as file3,\
        open("output.csv", "wb+") as file4, open("output2.csv", "wb+") as file5:
        writer = csv.writer(file4)
        writer2 = csv.writer(file5)
        key_words = [word.strip() for word in file2.readlines()]
        stop_words = [word.strip() for word in file3.readlines()]
        internal_stop_words = [' a ', ' an ', ' and ', 'as ', ' at ', ' be ', 'ed ',
              'ers ', ' for ',\
              ' he ', ' if ', ' in ', ' is ', ' it ', ' of ', ' on ', ' to ', 'her ', 'hers '\
              ' do ', ' did ', ' a ', ' b ', ' c ', ' d ', ' e ', ' f ', ' g ', ' h ', ' i ',\
              ' j ', ' k ', ' l ', ' m ', 'n ', ' n', ' nc ' ' o ', ' p ', ' q ', ' r ', ' s ',\
              ' t ', ' u ', ' v ', ' w ', ' x ', ' y ', 'z ', ',', '"', 'ers ', ' th ', ' gc ',\
                       ' so ', ' ot ', ' ft ', ' ow ', ' ir ', ' ho ', ' er ', ]
        for row in file1:
            row = row.strip()
            row = row.lower()
            for stop in stop_words:
                if stop in row:
                    row = row.replace(stop," ")
            for stopword in internal_stop_words:
                if stopword in row:
                    row = row.replace(stopword," ")
            for key in key_words:
                if key in row:
                    writer.writerow([key, row])
                elif key not in row:
                    writer2.writerow([row])

输出的输出2是" testclaims"中的每一行。多次重复。

例如,如果" testclaims"包含这一列:

    Happy
    Sad
    Angry
    Dog
    Cat

"输出2"正在输出打印这一列的csv:

    Happy
    Happy
    Happy
    Happy
    Happy
    Sad
    Sad
    Sad
    Sad
    Angry
    Angry
    Angry
    Angry
    Angry
    Dog
    Dog
    Dog
    Dog
    Dog
    Cat
    Cat
    Cat
    Cat
    Cat

它也不会输出相同数量的每一行。

1 个答案:

答案 0 :(得分:1)

你有一个双循环和每次打印行,但每行最多只需要一次。 你应该调整你的最后两行:

for row in file1:

    ...

    for key in key_words:
        if key in row:
            writer.writerow([key, row])
    if not any(key in row for key in key_words):
        writer2.writerow([row])