将函数映射到嵌套字典中指定键路径的好方法是什么,包括以下路径说明:
如果它比较简单,则可以假设仅嵌套字典,不包含字典列表,因为前者可以使用dict(enumerate(...))
获得。
但是,层次结构可能参差不齐,例如:
data = {0: {'a': 1, 'b': 2},
1: {'a': 10, 'c': 13},
2: {'a': 20, 'b': {'d': 100, 'e': 101}, 'c': 23},
3: {'a': 30, 'b': 31, 'c': {'d': 300}}}
希望能够这样指定键路径:
map_at(f, ['*',['b','c'],'d'])
要返回:
{0: {'a': 1, 'b': 2},
1: {'a': 10, 'c': 13},
2: {'a': 20, 'b': {'d': f(100), 'e': 101}, 'c': 23},
3: {'a': 30, 'b': 31, 'c': {'d': f(300)}}}
此处f
映射到关键路径[2,b,d]
和[3,c,d]
。
切片将指定为例如[0:3,b]
。
我认为路径规范是明确的,尽管可以推广为例如匹配键路径前缀(在这种情况下,f
也将映射到[0,b]`和其他路径)。
这可以通过理解和递归来实现吗?还是需要繁重的工作才能抓住KeyError
等?
请不要建议使用熊猫作为替代品。
答案 0 :(得分:1)
我不是伪代码的忠实拥护者,但是在这种情况下,您需要写下一个算法。这是我对您的要求的理解:
map_at(func, path_pattern, data)
:
path_pattern
不为空
data
是终端,则失败:我们没有匹配完整的path_pattern
̀因此没有理由应用该功能。只需返回data
。path_pattern
的头。即返回一个字典data key
-> map_at(func, new_path, data value)
,如果键与new_path
相匹配,则tail
是path_pattern
的{{1}},否则返回`path_pattern本身。head
被消耗掉了:
path_pattern
是终端,请返回data
func(data)
:return返回字典func
-> data key
注意:
map_at(func, [], data value)
与路径*-b-d
相匹配; 代码如下:
0-a-b-c-d-e
请注意,def map_at(func, path_pattern, data):
def matches(pattern, value):
try:
return pattern == '*' or value == pattern or value in pattern
except TypeError: # EDIT: avoid "break" in the dict comprehension if pattern is not a list.
return False
if path_pattern:
head, *tail = path_pattern
try: # try to consume head for each key of data
return {k: map_at(func, tail if matches(head, k) else path_pattern, v) for k,v in data.items()}
except AttributeError: # fail: terminal data but path_pattern was not consumed
return data
else: # success: path_pattern is empty.
try: # not a leaf: map every leaf of every path
return {k: map_at(func, [], v) for k,v in data.items()}
except AttributeError: # a leaf: map it
return func(data)
的意思是:尽可能消耗tail if matches(head, k) else path_pattern
。要在模式中使用范围,只需使用head
。
如您所见,您永远不会从情况2中逃脱:如果range(...)
为空,那么无论发生什么情况,您都必须映射所有叶子。在此版本中,这一点更加清楚:
path_pattern
编辑
如果要处理列表,可以尝试以下操作:
def map_all_leaves(func, data):
"""Apply func to all leaves"""
try:
return {k: map_all_leaves(func, v) for k,v in data.items()}
except AttributeError:
return func(data)
def map_at(func, path_pattern, data):
def matches(pattern, value):
try:
return pattern == '*' or value == pattern or value in pattern
except TypeError: # EDIT: avoid "break" in the dict comprehension if pattern is not a list.
return False
if path_pattern:
head, *tail = path_pattern
try: # try to consume head for each key of data
return {k: map_at(func, tail if matches(head, k) else path_pattern, v) for k,v in data.items()}
except AttributeError: # fail: terminal data but path_pattern is not consumed
return data
else:
map_all_leaves(func, data)
这个想法很简单:def map_at(func, path_pattern, data):
def matches(pattern, value):
try:
return pattern == '*' or value == pattern or value in pattern
except TypeError: # EDIT: avoid "break" in the dict comprehension if pattern is not a list.
return False
def get_items(data):
try:
return data.items()
except AttributeError:
try:
return enumerate(data)
except TypeError:
raise
if path_pattern:
head, *tail = path_pattern
try: # try to consume head for each key of data
return {k: map_at(func, tail if matches(head, k) else path_pattern, v) for k,v in get_items(data)}
except TypeError: # fail: terminal data but path_pattern was not consumed
return data
else: # success: path_pattern is empty.
try: # not a leaf: map every leaf of every path
return {k: map_at(func, [], v) for k,v in get_items(data)}
except TypeError: # a leaf: map it
return func(data)
与enumerate
的列表等效:
dict.items
因此,>>> list(enumerate(['a', 'b']))
[(0, 'a'), (1, 'b')]
>>> list({0:'a', 1:'b'}.items())
[(0, 'a'), (1, 'b')]
只是返回字典项,列表项(索引,值)或引发错误的包装器。
缺陷在于在此过程中列表会转换为字典:
get_items
编辑
由于您正在寻找类似Xpath的JSON,因此可以尝试https://pypi.org/project/jsonpath/或https://pypi.org/project/jsonpath-rw/。 (我没有测试那些库)。
答案 1 :(得分:0)
这不是很简单,效率也不高,但是应该可以工作:
def map_at(f,kp,d): return map_at0(f,kp,d,0)
def slice_contains(s,i): # no negative-index support
a=s.start or 0
return i>=a and (s.end is None or i<s.end) and\
not (i-a)%(s.step or 1)
def map_at0(f,kp,d,i):
if i==len(kp): return f(d)
if not isinstance(d,dict): return d # no such path here
ret={}
p=kp[i]
if isinstance(p,str) and p!='*': p=p,
for j,(k,v) in enumerate(sorted(d.items())):
if p=='*' or (slice_contains(p,j) if isinstance(p,slice) else k in p):
v=map_at0(f,kp,v,i+1)
ret[k]=v
return ret
请注意,这会复制它扩展的每个词典(因为它匹配键路径,即使没有其他键匹配并且从未应用f
),但通过引用返回了不匹配的子词典。还请注意,'*'
可以通过将其放在列表中来“引用”。
答案 2 :(得分:0)
我想您可能会喜欢这种刷新生成器的实现-
gcloud components update
它是这样的-
def select(sel = [], d = {}, res = []):
# (base case: no selector)
if not sel:
yield (res, d)
# (inductive: a selector) non-dict
elif not isinstance(d, dict):
return
# (inductive: a selector, a dict) wildcard selector
elif sel[0] == '*':
for (k, v) in d.items():
yield from select \
( sel[1:]
, v
, [*res, k]
)
# (inductive: a selector, a dict) list selector
elif isinstance(sel[0], list):
for s in sel[0]:
yield from select \
( [s, *sel[1:]]
, d
, res
)
# (inductive: a selector, a dict) single selector
elif sel[0] in d:
yield from select \
( sel[1:]
, d[sel[0]]
, [*res, sel[0]]
)
# (inductive: single selector not in dict) no match
else:
return
由于data = \
{ 0: { 'a': 1, 'b': 2 }
, 1: { 'a': 10, 'c': 13 }
, 2: { 'a': 20, 'b': { 'd': 100, 'e': 101 }, 'c': 23 }
, 3: { 'a': 30, 'b': 31, 'c': { 'd': 300 } }
}
for (path, v) in select(['*',['b','c'],'d'], data):
print(path, v)
# [2, 'b', 'd'] 100
# [3, 'c', 'd'] 300
返回一个 iterable ,因此可以在其上使用常规的select
函数-
map