在python中使用字典消除重复

时间:2015-05-05 20:06:06

标签: python twitter duplicates

我正在尝试从couchDB数据库中删除重复的推文。我想通过使用retweeted_status ID字段消除转推的推文。 Heres是我的代码;它不起作用并返回错误“字符串索引必须是字符”。任何帮助将不胜感激。

# initialize a dictionary of tweet ids
# the first time an id is found, put it into the dict as a key (with value 1 (not used))
uniqueIDs = {}

numtweets = len(search_results)
numdeleted = 0

for tweet in search_results:
    # find retweeted_status
    if 'retweeted_status' in tweet.keys():
        retweetID = [retweeted_status[id] for retweeted_status in tweet ['retweeted_status']]
        #tweetID = retweeted_status['id']
        # get the tweetid from the keys
        #tweetID = tweet['id']
        # if it is already in the id dictionary then delete this one
        if retweetID in uniqueIDs.keys():
            db.delete(tweet)
            numdeleted += 1
        # otherwise add it to the unique ids
        else:
            uniqueIDs[retweetID] = 1
    else:
        # reduce the count if we skipped one
        numtweets -= 1


print "Number of tweets at beginning = ", numtweets
print "Number of tweets deleted = ", numdeleted

2 个答案:

答案 0 :(得分:0)

您将retweetID声明为列表,然后将其用作单个值。您应该循环浏览[... for ... in ...],而不是使用tweet['retweeted_status']。你会有这样的事情:

if 'retweeted_status' in tweet:  # Note, don't need .keys()
    for retweedID in tweet['retweeted_status']:
        if retweetID in uniqueIDs:  # Again, don't need .keys()
            ...
        else:
            uniqueIDs[retweetID] = 1

答案 1 :(得分:0)

<FrameLayout xmlns:android="http://schemas.android.com/apk/res/android"
         android:id="@+id/fragment_container"
         android:layout_width="match_parent"
         android:layout_height="match_parent"/>

以上行将retweetID设置为列表。可能以下是您想要做的事情。

 retweetID = [retweeted_status[id] for retweeted_status in tweet ['retweeted_status']]

这将检查您的uniqueIds中是否存在每个转发ID。如果您只想要唯一的推文ID,那么您也可以使用集合而不是字典。

for tweet in search_results:
    # find retweeted_status
    if 'retweeted_status' in tweet.keys():
        retweetIDs = [retweeted_status[id] for retweeted_status in tweet ['retweeted_status']]
        #tweetID = retweeted_status['id']
        # get the tweetid from the keys
        #tweetID = tweet['id']
        # if it is already in the id dictionary then delete this one
        for tweet_id in retweetIDs:
            if retweetID in uniqueIDs.keys():
                db.delete(tweet)
                numdeleted += 1
        # otherwise add it to the unique ids
            else:
                uniqueIDs[tweet_id] = 1
    else:
        # reduce the count if we skipped one
        numtweets -= 1