我想将以下样式的数据框放入字典中。
输入:
>>>import pandas as pd
>>>df = pd.read_csv('file.csv')
>>>print(df)
Market Rep Name Date Amount
0 A1 B1 C1 D1 1
1 A1 B1 C1 D1 2
2 A1 B1 C1 D2 3
3 A1 B1 C1 D2 4
4 A1 B1 C2 D1 5
5 A1 B1 C2 D1 6
6 A1 B1 C2 D2 7
7 A1 B1 C2 D2 8
8 A1 B2 C3 D1 9
9 A1 B2 C3 D1 10
10 A1 B2 C3 D2 11
11 A1 B2 C3 D2 12
12 A2 B3 C4 D1 13
13 A2 B3 C4 D1 14
所需的输出:
>>> print(associated_data)
{'A1': {'B1': {'C1': {'D1':[1 + 2],
{'D2':[3 + 4]},
'C2': {'D1':[5 + 6],
'D2':[7 + 8]}}
{'B2': {'C3': {'D1':[9 + 10],
'D2':[11 + 12]}}},
'A2': {'B3': {'C4': {'D1':[13 + 14]}}}}
这可能不是组织数据和对数据进行排序的最佳方法,因此我愿意提出建议。
我尝试了一种我希望可以通过大量的for循环工作的方法,如下所示:
# Main function
for market in df['Market'].unique():
market_data = self.df.loc[self.df['Market'] == market]
associated_reps = market_data['Rep'].unique()
# Repeat
for rep in associated_reps:
rep_data = market_data.loc[market_data['Rep'] == rep]
associated_names = rep_data['Name'].unique()
# Repeat
for name in associated_names:
name_data = rep_data.loc[rep_data['Name'] == name]
associated_dates = name_data['Date'].unique()
# Repeat
for date in associated_dates:
date_data = name_data.loc[name_data['Date'] == date]
associated_amount = sum(date_data['Amount'].tolist())
# Attempted solution code (total fail)
breakdown[market][rep][name][date] = associated_amount
这确实将所有数据分开,最后尝试将所有数据放在一起。我希望您可以像这样制作一个超级嵌套的字典,但是它完全失败了(事实证明,不幸的是,这不是字典工作的方式lmao)。
如何产生所需的输出以产生相同的结果(也许还使用较短的排序代码)?
谢谢!
答案 0 :(得分:2)
也发布了类似的问题,例如,请参见here,但下面的此解决方案有效。
import pprint
import numpy as np
def make_dict(ind_vals, d, v):
"""Accumulate index entries as keys in a dict."""
p = d
# Get handle on the last but one dict level and make nested dicts if they
# are not present
for ix in ind_vals[:-1]:
# Replace with collection.OrderedDict if necessary.
p = p.setdefault(ix, {})
# Set the actual value of interest.
p[ind_vals[-1]] = v
# Set indices correctly.
df = df.set_index(['Market', 'Rep', 'Name', 'Date'])
# Group values so we don't have duplicate indices
df = df.groupby(level=df.index.names).apply(np.sum)
dct = {} # Replace with collection.OrderedDict if necessary.
for idx, val in df.iterrows():
make_dict(idx, dct, val.Amount)
pprint.pprint(dct)
# {'A1': {'B1': {'C1': {'D1': 3, 'D2': 7}, 'C2': {'D1': 11, 'D2': 15}},
# 'B2': {'C3': {'D1': 19, 'D2': 23}}},
# 'A2': {'B3': {'C4': {'D1': 27}}}}
答案 1 :(得分:0)
遍历行+值应该可以。
dict_values = {}
for idx, row in df.iterrows():
A, B, C, D, Amount = row
if A not in dict_values.keys():
dict_values[A]={}
if B not in dict_values[A].keys():
dict_values[A][B]={}
if C not in dict_values[A][B].keys():
dict_values[A][B][C]={}
if D not in dict_values[A][B][C].keys():
dict_values[A][B][C][D]=[Amount]
else:
dict_values[A][B][C][D].append(Amount)