我有一个包含主题标签和频率的元组列表,例如:
[('#Example', 92002),
('#example', 65544)]
我希望对与元组中第一个条目具有相同字符串的条目(但是区分大小写不同)进行求和,使第一个条目在第二个条目中保持最高值。以上内容将转变为:
[('#Example', 157,546)]
到目前为止我已经尝试过了:
import operator
for hashtag in hashtag_freq_list:
if hashtag[0].lower() not in [res_entry[0].lower() for res_entry in res]:
entries = [entry for entry in hashtag_freq_list if hashtag[0].lower() == entry[0].lower()]
k = max(entries,key=operator.itemgetter(1))[0]
v = sum([entry[1] for entry in entries])
res.append((k,v))
我只是想知道是否可以以更优雅的方式处理这个问题?
答案 0 :(得分:1)
我会使用字典
data = [('#example', 65544),('#Example', 92002)]
hashtable = {}
for i in data:
# See if this thing exists regardless of casing
if i[0].lower() not in hashtable:
# Create a dictionary
hashtable[i[0].lower()] = {
'meta':'',
'value':[]
}
# Copy the relevant information
hashtable[i[0].lower()]['value'].append(i[1])
hashtable[i[0].lower()]['meta'] = i[0]
# If the value exists
else:
# Check if the number it holds is the max against
# what was collected so far. If so, change meta
if i[1] > max(hashtable[i[0].lower()]['value']):
hashtable[i[0].lower()]['meta'] = i[0]
# Append the value regardless
hashtable[i[0].lower()]['value'].append(i[1])
# For output purposes
myList = []
# Build the tuples
for node in hashtable:
myList.append((hashtable[node]['meta'],sum(hashtable[node]['value'])))
# Voila!
print myList
# [('#Example', 157546)]