Question

我有许多列的df。我希望按id和transform分组，这些列的其余部分保持不变。这样做的最佳方法是什么？特别是，我有一个带有一堆id的df，我想在每个id中对列a和b进行z分数。列c应保持不变。在我的实际问题中，我有更多列。

我能想到的最好的方法是将dict {col_name: function_name}传递给transform。出于某种原因，这会引发TypeError。

MWE：

import pandas as pd
import numpy as np
np.random.seed(123) #reproducible ex
df = pd.DataFrame(data = {"a": np.arange(10), "b": np.arange(10)[::-1], "c": np.random.choice(a = np.arange(10), size = 10)}, index = pd.Index(data = np.random.choice(a = [1,2,3], size = 10), name = "id"))

#create a dict for all columns other than "c" and the function to do the transform
fmap = {k: lambda x: (x - x.mean()) / x.std() for k in df.columns if k != "c"}
df.groupby("id").transform(fmap) #yields error that "dict" is unhashable

原来这是一个已知的错误：https://github.com/pandas-dev/pandas/issues/17309

Answer 1

一种可能的解决方案是首先按difference过滤列名称，因为dict无法使用transfrom：

cols = df.columns.difference(['c'])
print (cols)
Index(['a', 'b'], dtype='object')

fmap = lambda x: (x - x.mean()) / x.std()
df[cols] = df.groupby("id")[cols].transform(fmap) 
print (df)
           a         b  c
id                       
3  -1.000000  1.000000  2
2  -1.091089  1.091089  2
1  -1.134975  1.134975  6
3   0.000000  0.000000  1
1  -0.529655  0.529655  3
2   0.218218 -0.218218  9
3   1.000000 -1.000000  6
2   0.872872 -0.872872  1
1   0.680985 -0.680985  0
1   0.983645 -0.983645  1

将函数应用于pandas groupby中的列子集

1 个答案: