这是我在这里的第一篇文章,希望有人可以帮助我。
我有一本庞大的字典,叫做“ hello”(> 200万个键)。该词典具有不同大小的值(有些是列表,有些只是一个值)。我必须遍历整个字典才能获得以下值:
portfolios = {k:v for k,v in hello.items() if '.some_list' in k}
hello_deltas = {k:v for k,v in hello.items() if '/delta[' or '/fast_spot[' or '/composite_delta_fx[' in k}
hello_before = {k:v for k,v in hello_deltas.items() if '_0_result' in k}
some_list_before = {}
for some_list in portfolios.values():
for some in some_list:
a = [i for i in hello_before.keys() if str(some) in i]
if len(a) != 0:
some_list_before[some] = a
hello_after = {k:v for k,v in hello_deltas.items() if '_1_result' in k}
some_list_after = {}
for some_list in portfolios.values():
for some in some_list:
a = [i for i in hello_after.keys() if str(some) in i]
if len(a) != 0:
some_list_after[some] = a
我已经对此进行了很多思考,并将其加速为一个庞大的理解词典组合。但是,这还不够。
我也尝试在pandas数据框中执行所有操作,但是由于字典值的大小不同,因此无法构建数据框!
有人可以帮我吗?
答案 0 :(得分:0)
首先,您应该使用函数来避免冗余:
def before_after(hello, result):
"""result = '_0_result' (before) or '_1_result' (after)"""
portfolios = {k:v for k,v in hello.items() if '.some_list' in k}
hello_deltas = {k:v for k,v in hello.items() if '/delta[' or '/fast_spot[' or '/composite_delta_fx[' in k}
hello_before_after = {k:v for k,v in hello_deltas.items() if result in k}
some_list_before_after = {}
for some_list in portfolios.values():
for some in some_list:
a = [i for i in hello_before_after.keys() if str(some) in i]
if len(a) != 0:
some_list_before_after[some] = a
return some_list_before_after
然后,在深入了解列表理解之前,请看一下代码:您正在构建中间词典,但可以使用生成器:
portfolios_lists = (v for k,v in hello.items() if '.some_list' in k)
for some_list in portfolios_lists:
for some in some_list:
...
或者更好:
portfolios_somes = (s for k,v in hello.items() for s in v if '.some_list' in k)
for some in portfolios_somes:
...
您只能使用hello_deltas
中的键:
hello_deltas_before_after = [k for k in hello.keys() if result in k and ('/delta[' or '/fast_spot[' or '/composite_delta_fx[' in k)]
注意:使用某个函数,您可能会认为您将两次测试'/delta[' or '/fast_spot[' or '/composite_delta_fx[' in k
:一次用于before
,另一次用于after
。实际上,这是不正确的:您首先测试result in k
(即'_0_result' in k
或'_0_result' in k
)和然后进行昂贵的测试。
代码现在看起来像:
def before_after(hello, result):
portfolios_somes = (s for k,v in hello.items() for s in v if '.some_list' in k)
hello_deltas_before_after = [k for k in hello.keys() if result in k and ('/delta[' or '/fast_spot[' or '/composite_delta_fx[' in k)]
some_list_before_after = {}
for some in portfolios_somes:
a = [i for i in hello_deltas_before_after if str(some) in i]
if len(a) != 0:
some_list_before_after[some] = a
return some_list_before_after
现在,字典理解:
some_list_before_after = {
some: a
for some in portfolios_somes
for a in ([i for i in hello_deltas_before_after if str(some) in i], )
if a}
一个元素元组是一种只计算一次a
的技巧。完整代码(未经测试):
def before_after(hello, result):
portfolios_somes = (s for k, v in hello.items() for s in v if '.some_list' in k)
hello_deltas_before_after = [k for k in hello.keys() if result in k and ('/delta[' or '/fast_spot[' or '/composite_delta_fx[' in k)]
return {
some: a
for some in portfolios_somes
for a in ([i for i in hello_deltas_before_after if str(some) in i], )
if a}
这应该比原始版本要快,但是问题有一个(非常粗略的)O(n ^ 2)时间复杂度,而且您一秒钟都不会得到巡回结果。在原始数据的子集(例如hello_short = dict(itertools.islice(hello.items(), 1000))
)上进行尝试。