Question

我有一个pd.DataFrame，每行代表一组人。他们有一个id（我的数据框中有几列，但这里是我的示例数据框的"id"列中的摘要）。每个小组代表几个人（{{1}列）。

我正在尝试将这些组划分为最大大小为"size"的较小组。例如，如果"max_size"中包含max_size = 5和"id" = "foo"的行应替换为三行，所有行均包含"size" = 13和相应的"id" = "foo"，"size" = 5和"size" = 5

我已经编写了一个可用的函数，但是我正在寻找更多熊猫惯用的方式（如果存在的话）。

我的功能是

"size" = 3

以下数据框

def custom_duplicating_function(df):
    def aux_custom_duplicating_function(row, max_size=5):
        row = row.to_dict()
        size = row["size"]
        L = [row.copy() for i in range((size // max_size + 1))]
        for i in range(len(L) - 1):
            L[i]["size"] = max_size 
        L[-1]["size"] = size%max_size
        return(pd.DataFrame.from_dict(L))

    temp = df.apply(aux_custom_duplicating_function, axis=1)
    result = pd.concat([temp[i] for i in range(len(temp.index))])
    return(result)

应在

中进行转换

test = pd.DataFrame.from_dict([{"id":"foo", "size":13},
                     {"id":"bar", "size":17},
                     {"id":"baz", "size":3}])
************
    id  size
0  foo    13
1  bar    17
2  baz     3
************

Answer 1

将explode用于> = 0.25的熊猫

test['size'] = test['size'].apply(lambda x:[5]*(x//5)+[(x%5)])

test.explode('size')

Answer 2

我们可以通过apply嵌套这些项目，然后使用例如来自this answer的代码。

import pandas as pd
max_size=5
test = pd.DataFrame.from_dict([{"id":"foo", "size":13},
                     {"id":"bar", "size":17},
                     {"id":"baz", "size":3}])

test['size'] = test['size'].apply(lambda x: [max_size]*(x//max_size)+[x%max_size])
test2 = test.apply(lambda x: pd.Series(x['size']),axis=1).stack().reset_index(level=1, drop=True)
test2.name = 'size'
test.drop('size', axis=1).join(test2)

    id  size
0  foo   5.0
0  foo   5.0
0  foo   3.0
1  bar   5.0
1  bar   5.0
1  bar   5.0
1  bar   2.0
2  baz   3.0

熊猫惯用的方法来执行此自定义复制行功能？

2 个答案: