从具有不同目的的推文列表中删除重复项

时间:2015-05-05 13:33:23

标签: python python-2.7 twitter

我有一个推文列表,我想删除我的推文中有不同目的的每个重复。我试过这段代码,但它并没有给我我想要的东西:

f1 = csv.reader(open(r'C:\pp.csv', 'rb'))
writer = csv.writer(open(r'C:\oo.csv', 'wb'))

tweet = set()
tweet_start = set()
for row in f1:
    the_tweet = row[1]
    start = ' '.join(the_tweet.split(' ')[:5])
    if start not in tweet_start:
         writer.writerow(row)
         tweet.add(the_tweet)
         tweet_start.add(start)
    f1.close()
    writer.close()

这是重复的例子:

1-alibaba looks to rural china to popularize its mobile os URL
2-alibaba looks to rural china to popularize its mobile os URL financial NEWES 

0 个答案:

没有答案