我有一个推文列表,我想删除我的推文中有不同目的的每个重复。我试过这段代码,但它并没有给我我想要的东西:
f1 = csv.reader(open(r'C:\pp.csv', 'rb'))
writer = csv.writer(open(r'C:\oo.csv', 'wb'))
tweet = set()
tweet_start = set()
for row in f1:
the_tweet = row[1]
start = ' '.join(the_tweet.split(' ')[:5])
if start not in tweet_start:
writer.writerow(row)
tweet.add(the_tweet)
tweet_start.add(start)
f1.close()
writer.close()
这是重复的例子:
1-alibaba looks to rural china to popularize its mobile os URL
2-alibaba looks to rural china to popularize its mobile os URL financial NEWES