Question

我对Python很新，我有这个代码可以导入一个csv文件，然后将其打印出来，并将文件的每个单词打印到一个新的csv文件中的自己的行中。例如：

csv文件：

The dog is black and has a black collar

输出csv文件：

The
dog
is
black
and
has
a
black
collar

但是，如果它在同一行中，我希望输出不打印两次相同的单词。例如：

所需的输出csv文件：

The
dog
is
black
and
has
a
collar

请注意“黑色”一词是如何打印两次的？这就是我想要的。如果有人能帮我解决这个问题，那就太好了。就像我说的那样，我仍然是Python的新手，我正在弄清楚它。提前谢谢！

for row in file1:
    row = row.strip()
    row = row.lower()

    for stopword in internal_stop_words:
        if stopword in row:
            row = row.replace(stopword," ")

    for word in row.split():
        writer.writerow([word])

Answer 1

尝试累积您在set中已经看过的单词，然后只输出该单词中不包含的单词：

# before you process the file
seen_words = set()

# ... later, in the loop...
for word in row.split():
  if word not in seen_words:
    writer.writerow([word])
    seen_words.add(word)

Answer 2

如果您不需要按照文本中首次显示的顺序进行打印，则可以尝试set（）

>>> s = 'The dog is black and has a black collar'
>>> s.split()
['The', 'dog', 'is', 'black', 'and', 'has', 'a', 'black', 'collar']
>>> set(s.split())
{'is', 'has', 'black', 'and', 'dog', 'collar', 'a', 'The'}

Answer 3

我实际上最终解决了我自己的问题！谢谢你的建议。这是我做的：

for row in file1:
    row = row.strip()
    row = row.lower()

    for stopword in internal_stop_words:
        if stopword in row:
            row = row.replace(stopword," ")

    mylist = row.split()
    newlist = []
    for word in mylist:
        if not word in newlist:
            newlist.append(word)
            writer.writerow([word])

在Python中不输出两次相同的单词

3 个答案: