在Python中不输出两次相同的单词

时间:2014-08-13 18:43:28

标签: python csv

我对Python很新,我有这个代码可以导入一个csv文件,然后将其打印出来,并将文件的每个单词打印到一个新的csv文件中的自己的行中。例如:

csv文件:

The dog is black and has a black collar

输出csv文件:

The
dog
is
black
and
has
a
black
collar

但是,如果它在同一行中,我希望输出不打印两次相同的单词。例如:

所需的输出csv文件:

The
dog
is
black
and
has
a
collar

请注意“黑色”一词是如何打印两次的?这就是我想要的。如果有人能帮我解决这个问题,那就太好了。就像我说的那样,我仍然是Python的新手,我正在弄清楚它。提前谢谢!

for row in file1:
    row = row.strip()
    row = row.lower()

    for stopword in internal_stop_words:
        if stopword in row:
            row = row.replace(stopword," ")

    for word in row.split():
        writer.writerow([word])

3 个答案:

答案 0 :(得分:2)

尝试累积您在set中已经看过的单词,然后只输出该单词中不包含的单词:

# before you process the file
seen_words = set()

# ... later, in the loop...
for word in row.split():
  if word not in seen_words:
    writer.writerow([word])
    seen_words.add(word)

答案 1 :(得分:1)

如果您不需要按照文本中首次显示的顺序进行打印,则可以尝试set()

>>> s = 'The dog is black and has a black collar'
>>> s.split()
['The', 'dog', 'is', 'black', 'and', 'has', 'a', 'black', 'collar']
>>> set(s.split())
{'is', 'has', 'black', 'and', 'dog', 'collar', 'a', 'The'}

答案 2 :(得分:0)

我实际上最终解决了我自己的问题!谢谢你的建议。这是我做的:

for row in file1:
    row = row.strip()
    row = row.lower()

    for stopword in internal_stop_words:
        if stopword in row:
            row = row.replace(stopword," ")

    mylist = row.split()
    newlist = []
    for word in mylist:
        if not word in newlist:
            newlist.append(word)
            writer.writerow([word])