如何不在Python中为每一行打印重复单词?

时间:2014-07-25 17:09:05

标签: python csv

我还在习惯Python!我只需要一点帮助:在我的程序中有两个csv文件,一个名为“testclaims”,另一个名为“notinlist”。对于writer3,我让程序在自己的行中打印新csv中每行的每个单词。例如,如果testclaims中的行显示

The boy fell and the boy got hurt

输出:

The
boy
fell
and
the
boy
got
hurt

但是,如果它在同一行重复,我不希望它再次打印一个单词。我希望输出说:

The
boy
fell
and
the
got
hurt

我一直试图这样做,现在围绕计数器和频率玩,但无法弄明白。如果你们能帮助我,那就太好了!这是我的代码:

import csv

with open("testclaims.csv") as file1, open("masterlist.csv") as file2,
    open("stopwords.csv") as file3,\
    open("output.csv", "wb+") as file4, open("output2.csv", "wb+") as file5:
    writer = csv.writer(file4)
    writer2 = csv.writer(file5)
    key_words = [word.strip() for word in file2.readlines()]
    stop_words = [word.strip() for word in file3.readlines()]
    internal_stop_words = [' a ', ' an ', ' and ', 'as ', ' at ', ' be ', 'ed ',
          'ers ', ' for ',\
          ' he ', ' if ', ' in ', ' is ', ' it ', ' of ', ' on ', ' to ', 'her ', 'hers '\
          ' do ', ' did ', ' a ', ' b ', ' c ', ' d ', ' e ', ' f ', ' g ', ' h ', ' i ',\
          ' j ', ' k ', ' l ', ' m ', 'n ', ' n', ' nc ' ' o ', ' p ', ' q ', ' r ', ' s ',\
          ' t ', ' u ', ' v ', ' w ', ' x ', ' y ', 'z ', ',', '"', 'ers ', ' th ', ' gc ',\
                   ' so ', ' ot ', ' ft ', ' ow ', ' ir ', ' ho ', ' er ', ]
    for row in file1:
        row = row.strip()
        row = row.lower()

        for stopword in internal_stop_words:
            if stopword in row:
                row = row.replace(stopword," ")

        for key in key_words:
            if key in row:
                writer.writerow([key, row])

        for word in row.split(): #This Part Here!
            writer3.writerow([word])

        if not any(key in row for key in key_words):
            writer2.writerow([row])

2 个答案:

答案 0 :(得分:1)

使用OrderedDict来简单的事情......

>>> import collections
>>> print "\n".join(collections.OrderedDict.fromkeys("The boy fell and the boy got hurt".split()).keys())
The
boy
fell
and
the
got
hurt

答案 1 :(得分:1)

使用set()

row = 'The boy fell and the boy got hurt'

s = set()

for word in row.split():
    if word not in s:
        s.add(word)
        #print word
        writer3.writerow([word])