我正在尝试实现类似于GroupBy results to dictionary of lists的功能。
Column1 Column2 Column3
0 23 1
1 5 2
1 2 3
1 19 5
2 56 1
2 22 2
3 2 4
3 14 5
4 59 1
5 44 1
5 1 2
5 87 3
sdf.groupby('Column1')['Column3'].apply(list).to_dict()
工作完美。
但是,我需要获取多列元组的列表,例如:
sdf.groupby('Column1')['Column2', 'Column3'].apply(list).to_dict()
获得类似这样的输出:
{0: [(23, 1)],
1: [(5,2), (2,3), (19,5)],
...}
返回标头而不是值。
以下是我的解决方法(在我看来,要获得此结果的工作量太大):
def get_dict_of_set_from_df(df: pd.DataFrame, key_cols: list, val_cols: list) -> dict:
"""
Generic method to create Dict[key_cols] = set(val_cols)
:param df:
:param key_cols:
:param val_cols:
:return:
"""
# df.groupby(key_cols)[val_cols].apply(set).to_dict()
cols = key_cols + val_cols
len_key = len(key_cols)
len_val = len(val_cols)
# get all relevant columns (key_cols and val_cols) from the dataframe
l_ = df[cols].values.tolist()
dc = defaultdict(set)
for c in l_:
# if key or val is a singleton, then do not put into tuple
k = tuple(c[:len_key]) if len_key > 1 else c[:len_key][0]
v = tuple(c[len_key:]) if len_val > 1 else c[len_key:][0]
dc[k].add(v)
return dc
答案 0 :(得分:0)
您可以这样做:
import pandas as pd
data = [[0, 23, 1],
[1, 5, 2],
[1, 2, 3],
[1, 19, 5],
[2, 56, 1],
[2, 22, 2],
[3, 2, 4],
[3, 14, 5],
[4, 59, 1],
[5, 44, 1],
[5, 1, 2],
[5, 87, 3]]
df = pd.DataFrame(data=data, columns=['c1', 'c2', 'c3'])
def to_list(x):
return list(zip(x.c2, x.c3))
groups = df.groupby('c1')[['c2', 'c3']].apply(to_list)
result = {k: values for k, values in zip(groups.index, groups)}
print(result)
输出
{0: [(23, 1)], 1: [(5, 2), (2, 3), (19, 5)], 2: [(56, 1), (22, 2)], 3: [(2, 4), (14, 5)], 4: [(59, 1)], 5: [(44, 1), (1, 2), (87, 3)]}