这是示例数据。输入目录中的所有内容都是动态的。唯一的问题是数据字典将为input_dict中的每个键值固定7个不同的值。而且它可能只有1个或0个值。
input_dict = { 'all_val' : ['a', 'b', 'c' ],
'2nd_group' : ['a', 'b'] ,
'3rd_grp' : ['a' , 'c']}
data = {
'a' : [1,0,1,0,0,0,1],
'b' : [0,0,1,1,0,1,0],
'c' : [0,1,1,0,0,0,1] }
required_output = {'2nd_group': 5, '3rd_grp': 4, 'all_val': 6}
逻辑:对于all_val,取a,b和c转到数据字典。如果a [0],b [0],c [0]中的任何一个为1,则应考虑1。a [1],b [1],c [1]的方式相同,最后对所有1计数
我的解决方案:
temp_dict = {}
output_dict = {}
for a in input_dict.keys():
temp_dict[a] = [0]*7
for key, value in input_dict.items():
for v in value:
for j , d in enumerate(data[v]):
temp_dict[key][j] = max( temp_dict[key][j] , d )
for k,v in temp_dict.items():
total = 0
for t in temp_dict[k]:
total = total + t
output_dict[k] = total
print output_dict
是否有任何方法可以改善性能或解决此问题的其他方法。
答案 0 :(得分:0)
from collections import defaultdict
input_dict = { 'all_val' : ['a', 'b', 'c' ],
'2nd_group' : ['a', 'b'] ,
'3rd_grp' : ['a' , 'c']}
data = {
'a' : [1,0,1,0,0,0,1],
'b' : [0,0,1,1,0,1,0],
'c' : [0,1,1,0,0,0,1] }
# {'2nd_group': 5, '3rd_grp': 4, 'all_val': 6}
temp_dict = defaultdict(list)
SIZE_OF_LIST = 7
data_keys = data.keys()
# we're basically constructiing the temp_dict on the fly by iterating throug the X and Y axis of the matrix
for i in range(SIZE_OF_LIST): # i is in X axis of the matrix and represents the columns in this case
for group, group_items in input_dict.items(): # for each column we iterate over the Y axis (a, b, c)
# we then need to obtain all the values on a column (the actual 0's and 1's) and create a
# list from it. In this list we take only does rows that are of interest for us
# For example, for 2nd_group (a, b), considering that we are on column 0 the resulting list
# will be generated by getting the values for 'a' and 'b', hence we will have [1, 0]
data_values = [data[data_key][i] for data_key in group_items] # thanks to list comprehensions
# we then need to evaluate the previously created list with the any
# any(data_vaues) is actually any([1, 0]) (from the previous example)
# which yelds 1, because there is at least one value with the value 1
# the resulting value is added at the right position in the temp_dict
temp_dict[group].append(1 if any(data_values) else 0)
output_dict = {}
for group, elements in temp_dict.items():
# we just iterate over the temp_dict one more time and create the
# sums for all our groups (all_val, 2nd_group, 3rd_group)
# and add up all the 1's in the list.
# For example if we're on '2nd_group' then it's basically a sum(temp_dict['2nd_group'])
# which yields your desired result
output_dict[group] = sum(elements)
print output_dict
答案 1 :(得分:0)
在我的评论之后,有几个部分可以简化:
代替
for k,v in temp_dict.items():
total = 0
for t in temp_dict[k]:
total = total + t
output_dict[k] = total
您可以写:
output_dict = {k: sum(v) for k,v in temp_dict.items()}
代替
for key, value in input_dict.items():
for v in value:
for j , d in enumerate(data[v]):
temp_dict[key][j] = max( temp_dict[key][j] , d )
您可以写:
for key, value in input_dict.items():
temp_dict[key] = [max(data[v][index] for v in value) for index in range(7)]
然后,您可以考虑将所有内容组合在一起并进入:
output_dict = {k: sum(max(data[key][index] for key in keys) for index in range(7)) for k, keys in input_dict.items()}
答案 2 :(得分:0)
您可以进行一些调整并简化逻辑。例如,您不需要在第一遍中单独创建密钥。您可以跳过第二遍中的温度指令。整体逻辑可以简化。
input_dict = { 'all_val' : ['a', 'b', 'c' ],
'2nd_group' : ['a', 'b'] ,
'3rd_grp' : ['a' , 'c']}
data = {
'a' : [1,0,1,0,0,0,1],
'b' : [0,0,1,1,0,1,0],
'c' : [0,1,1,0,0,0,1] }
#required_output = {'2nd_group': 5, '3rd_grp': 4, 'all_val': 6}
res = {}
for key,value in input_dict.items():
output = 0
#create a zip from the lists in data so you can check for 1s at every index
for i in zip(*[data[v] for v in value]):
if any(i): #checking if any of them have a 1.
output += 1
res[key] = output
timeit结果:
新代码:6.36 µs ± 115 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
有问题的代码(基准测试):19.8 µs ± 339 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
答案 3 :(得分:0)
您可以使用OR逻辑操作执行以下操作:
import numpy as np
output = {}
for key in input_dict:
r = []
for data_key in data:
if data_key in input_dict[key]:
if len(r) == 0:
r = np.asarray(data[data_key])
else:
r = r | np.asarray(data[data_key])
output[key] = list(r).count(1)
print output
答案 4 :(得分:0)
我的方法使用7个元素并行计算列表中的所有项目,并且不需要将单独的已安装项目作为numpy。在Python 3中,其内容为:
import operator
import functools
input_dict = { 'all_val' : ['a', 'b', 'c' ],
'2nd_group' : ['a', 'b'] ,
'3rd_grp' : ['a' , 'c']}
data = {
'a' : 0b1010001,
'b' : 0b0011010,
'c' : 0b0110001}
def num_bits(n):
result = 0
while n > 0:
result += n & 1
n >>= 1
return result
if __name__ == '__main__':
result = {}
for inkey, labels in input_dict.items():
result[inkey] = num_bits(functools.reduce(operator.__or__, (data[l] for l in labels)))
print(result)
完全冒险的人甚至可以用字典理解代替主要部分:
print({inkey: num_bits(functools.reduce(operator.__or__, (data[l] for l in labels))) for inkey, labels in input_dict.items()})