将结果分组到列表字典(具有多列)

时间:2018-11-30 18:11:34

标签: python dataframe apply

我正在尝试实现类似于GroupBy results to dictionary of lists的功能。

Column1 Column2 Column3
0       23      1
1       5       2
1       2       3
1       19      5
2       56      1
2       22      2
3       2       4
3       14      5
4       59      1
5       44      1
5       1       2
5       87      3

sdf.groupby('Column1')['Column3'].apply(list).to_dict() 

工作完美。

但是,我需要获取多列元组的列表,例如:

sdf.groupby('Column1')['Column2', 'Column3'].apply(list).to_dict() 

获得类似这样的输出:

{0: [(23, 1)],
1: [(5,2), (2,3), (19,5)],
...}

返回标头而不是值。

以下是我的解决方法(在我看来,要获得此结果的工作量太大):

def get_dict_of_set_from_df(df: pd.DataFrame, key_cols: list, val_cols: list) -> dict:
    """
    Generic method to create Dict[key_cols] = set(val_cols)
    :param df:
    :param key_cols:
    :param val_cols:
    :return:
    """

    # df.groupby(key_cols)[val_cols].apply(set).to_dict()

    cols = key_cols + val_cols
    len_key = len(key_cols)
    len_val = len(val_cols)

    # get all relevant columns (key_cols and val_cols) from the dataframe
    l_ = df[cols].values.tolist()
    dc = defaultdict(set)
    for c in l_:
        # if key or val is a singleton, then do not put into tuple
        k = tuple(c[:len_key]) if len_key > 1 else c[:len_key][0]
        v = tuple(c[len_key:]) if len_val > 1 else c[len_key:][0]
        dc[k].add(v)
    return dc

1 个答案:

答案 0 :(得分:0)

您可以这样做:

import pandas as pd

data = [[0, 23, 1],
        [1, 5, 2],
        [1, 2, 3],
        [1, 19, 5],
        [2, 56, 1],
        [2, 22, 2],
        [3, 2, 4],
        [3, 14, 5],
        [4, 59, 1],
        [5, 44, 1],
        [5, 1, 2],
        [5, 87, 3]]

df = pd.DataFrame(data=data, columns=['c1', 'c2', 'c3'])


def to_list(x):
    return list(zip(x.c2, x.c3))


groups = df.groupby('c1')[['c2', 'c3']].apply(to_list)
result = {k: values for k, values in zip(groups.index, groups)}
print(result)

输出

{0: [(23, 1)], 1: [(5, 2), (2, 3), (19, 5)], 2: [(56, 1), (22, 2)], 3: [(2, 4), (14, 5)], 4: [(59, 1)], 5: [(44, 1), (1, 2), (87, 3)]}