Question

我目前有这样的字典（假设许多国家，州和城市）：

'USA': {
    'Texas': {
        'Austin': {
            '2017-01-01': 169,
            '2017-02-01': 231
        },
        'Houston': {
            '2017-01-01': 265,
            '2017-02-01': 310
        }
    }

我想创建一个新的字典＆＃34;按＆＃34;分组只有国家和日期，过滤给定的州，所以结果将是：

'USA': {
            '2017-01-01': 434,
            '2017-02-01': 541

    }

我可以通过循环遍历dict的每一层来做到这一点，但它很难阅读。有没有办法用lambda / map函数来做呢？

另外，由于其他原因，我们无法使用pandas数据帧，因此我无法使用该groupby功能。

Answer 1

如果您只想提取嵌套字典的最低级别值，可以使用生成器来实现。

以下生成器是由@Richard编写的稍微修改过的版本。

然后，您可以将其与collections.defaultdict结合使用，以获得所需的结果。

from collections import defaultdict

def NestedDictValues(d):
    for k, v in d.items():
        if isinstance(v, dict):
            yield from NestedDictValues(v)
        else:
            yield (k, v)

def sumvals(lst):
    c = defaultdict(int)
    for i, j in lst:
        c[i] += j
    return dict(c)

d = {'USA': sumvals(NestedDictValues(s))}

# {'USA': {'2017-01-01': 434, '2017-02-01': 541}}

Answer 2

我相信在这种情况下，使用递归比使用map或reduce函数要清晰得多：

import re
import itertools
s = {'USA': {
'Texas': {
    'Austin': {
        '2017-01-01': 169,
        '2017-02-01': 231
    },
    'Houston': {
        '2017-01-01': 265,
        '2017-02-01': 310
    }
   }
 }
}
def get_dates(d):
  val = [(a, b) if isinstance(b, int) and re.findall('\d+-\d+-\d+', a) else get_dates(b) for a, b in d.items()]
  return [i for b in val for i in b] if not all(isinstance(i, tuple) for i in val) else val

last_data = {a:{c:sum(g for _, g in h) for c, h in itertools.groupby(sorted(get_dates(b), key=lambda x:x[0]), key=lambda x:x[0])} for a, b in s.items()}

输出：

{'USA': {'2017-02-01': 541, '2017-01-01': 434}}

Answer 3

以下是使用collections.Counter简化现有代码的方法。假设您的源词典名为d：

from collections import Counter
my_state='Texas'
mapped = {
    country: [Counter(d[country][my_state][city]) for city in d[country][my_state]]
    for country in d
}
print(mapped)
#{'USA': [Counter({'2017-01-01': 265, '2017-02-01': 310}),
#  Counter({'2017-01-01': 169, '2017-02-01': 231})]}

这会将原始字典映射到{country: list_of_counters}形式之一。

现在您可以使用operator.add()来缩小此列表：

from operator import add
for country in mapped:
    print("{country}: {sums}".format(country=country, sums=reduce(add, mapped[country])))
#USA: Counter({'2017-02-01': 541, '2017-01-01': 434})

或map/reduce：

map(lambda country: {country: reduce(add, mapped[country])}, mapped)
[{'USA': Counter({'2017-01-01': 434, '2017-02-01': 541})}]

如果您希望dict代替Counter s：

map(lambda country: {country: dict(reduce(add, mapped[country]))}, mapped)
#[{'USA': {'2017-01-01': 434, '2017-02-01': 541}}]

使用map和lambda来处理嵌套的dicts

3 个答案: