我对Python很新,我有这个代码可以导入一个csv文件,然后将其打印出来,并将文件的每个单词打印到一个新的csv文件中的自己的行中。例如:
csv文件:
The dog is black and has a black collar
输出csv文件:
The
dog
is
black
and
has
a
black
collar
但是,如果它在同一行中,我希望输出不打印两次相同的单词。例如:
所需的输出csv文件:
The
dog
is
black
and
has
a
collar
请注意“黑色”一词是如何打印两次的?这就是我想要的。如果有人能帮我解决这个问题,那就太好了。就像我说的那样,我仍然是Python的新手,我正在弄清楚它。提前谢谢!
for row in file1:
row = row.strip()
row = row.lower()
for stopword in internal_stop_words:
if stopword in row:
row = row.replace(stopword," ")
for word in row.split():
writer.writerow([word])
答案 0 :(得分:2)
尝试累积您在set
中已经看过的单词,然后只输出该单词中不包含的单词:
# before you process the file
seen_words = set()
# ... later, in the loop...
for word in row.split():
if word not in seen_words:
writer.writerow([word])
seen_words.add(word)
答案 1 :(得分:1)
如果您不需要按照文本中首次显示的顺序进行打印,则可以尝试set()
>>> s = 'The dog is black and has a black collar'
>>> s.split()
['The', 'dog', 'is', 'black', 'and', 'has', 'a', 'black', 'collar']
>>> set(s.split())
{'is', 'has', 'black', 'and', 'dog', 'collar', 'a', 'The'}
答案 2 :(得分:0)
我实际上最终解决了我自己的问题!谢谢你的建议。这是我做的:
for row in file1:
row = row.strip()
row = row.lower()
for stopword in internal_stop_words:
if stopword in row:
row = row.replace(stopword," ")
mylist = row.split()
newlist = []
for word in mylist:
if not word in newlist:
newlist.append(word)
writer.writerow([word])