Question

我还在习惯Python！我只需要一点帮助：在我的程序中有两个csv文件，一个名为“testclaims”，另一个名为“notinlist”。对于writer3，我让程序在自己的行中打印新csv中每行的每个单词。例如，如果testclaims中的行显示

The boy fell and the boy got hurt

输出：

The
boy
fell
and
the
boy
got
hurt

但是，如果它在同一行重复，我不希望它再次打印一个单词。我希望输出说：

The
boy
fell
and
the
got
hurt

我一直试图这样做，现在围绕计数器和频率玩，但无法弄明白。如果你们能帮助我，那就太好了！这是我的代码：

import csv

with open("testclaims.csv") as file1, open("masterlist.csv") as file2,
    open("stopwords.csv") as file3,\
    open("output.csv", "wb+") as file4, open("output2.csv", "wb+") as file5:
    writer = csv.writer(file4)
    writer2 = csv.writer(file5)
    key_words = [word.strip() for word in file2.readlines()]
    stop_words = [word.strip() for word in file3.readlines()]
    internal_stop_words = [' a ', ' an ', ' and ', 'as ', ' at ', ' be ', 'ed ',
          'ers ', ' for ',\
          ' he ', ' if ', ' in ', ' is ', ' it ', ' of ', ' on ', ' to ', 'her ', 'hers '\
          ' do ', ' did ', ' a ', ' b ', ' c ', ' d ', ' e ', ' f ', ' g ', ' h ', ' i ',\
          ' j ', ' k ', ' l ', ' m ', 'n ', ' n', ' nc ' ' o ', ' p ', ' q ', ' r ', ' s ',\
          ' t ', ' u ', ' v ', ' w ', ' x ', ' y ', 'z ', ',', '"', 'ers ', ' th ', ' gc ',\
                   ' so ', ' ot ', ' ft ', ' ow ', ' ir ', ' ho ', ' er ', ]
    for row in file1:
        row = row.strip()
        row = row.lower()

        for stopword in internal_stop_words:
            if stopword in row:
                row = row.replace(stopword," ")

        for key in key_words:
            if key in row:
                writer.writerow([key, row])

        for word in row.split(): #This Part Here!
            writer3.writerow([word])

        if not any(key in row for key in key_words):
            writer2.writerow([row])

Answer 1

使用OrderedDict来简单的事情......

>>> import collections
>>> print "\n".join(collections.OrderedDict.fromkeys("The boy fell and the boy got hurt".split()).keys())
The
boy
fell
and
the
got
hurt

Answer 2

使用set()

row = 'The boy fell and the boy got hurt'

s = set()

for word in row.split():
    if word not in s:
        s.add(word)
        #print word
        writer3.writerow([word])

如何不在Python中为每一行打印重复单词？

2 个答案: