Question

使用groupby + apply调用函数时，我希望从DataFrame转到Series groupby对象，将函数应用于每个需要a的组Series作为输入并返回Series作为输出，然后将groupby + apply调用的输出指定为DataFrame中的字段。默认行为是让groupby + apply的输出由分组字段编制索引，这样我就无法将其清楚地分配回DataFrame。我希望将我调用的函数应用于Series作为输入并返回Series作为输出;我认为它比DataFrame更加清晰DataFrame。（这不是获得此示例结果的最佳方式;真正的应用程序非常不同。）

import pandas as pd
df = pd.DataFrame({
 'A': [999, 999, 111, 111],
 'B': [1, 2, 3, 4],
 'C': [1, 3, 1, 3]
})
def less_than_two(series):
  # Intended for series of length 1 in this case
  # But not intended for many-to-one generally
  return series.iloc[0] < 2
output = df.groupby(['A', 'B'])['C'].apply(less_than_two)

我希望output上的索引与df相同，否则我无法分配到df（干净地）：

df['Less_Than_Two'] = output

像output.index = df.index这样的东西看起来太丑了，使用group_keys参数似乎不起作用：

output = df.groupby(['A', 'B'], group_keys = False)['C'].apply(less_than_two)
df['Less_Than_Two'] = output

Answer 1

transform会返回原始index的结果，就像您要求的那样。它将在组的所有元素中广播相同的结果。警告，请注意dtype可能被推断为其他内容。你可能不得不自己施展。

在这种情况下，为了添加其他列，我会使用assign

df.assign(
    Less_Than_Two=df.groupby(['A', 'B'])['C'].transform(less_than_two).astype(bool))

     A  B  C Less_Than_Two
0  999  1  1          True
1  999  2  3         False
2  111  3  1          True
3  111  4  3         False

Answer 2

假设您需要groupby（并且生成的groupby对象的行数将少于您的DataFrame - 这不是示例数据的情况），然后将系列分配给＆＃39 ; Is.Even＆＃39;列将导致NaN值（因为output的索引将短于df的索引。）

相反，根据示例数据，最简单的方法是将output - 作为DataFrame - 与df合并，如下所示：

output = df.groupby(['A','B'])['C'].agg({'C':is_even}).reset_index() # reset_index restores 'A' and 'B' from indices to columns
output.columns = ['A','B','Is_Even'] #rename target column prior to merging
df.merge(output, how='left', on=['A','B']) # this will support a many-to-one relationship between combinations of 'A' & 'B' and 'Is_Even'
# and will thus properly map aggregated values to unaggregated values

另外，我应该注意到，使用下划线比变量名中的点更好;例如，与R不同，点作为访问对象属性的运算符，因此在变量名中使用它们可能会阻止功能/造成混淆。

使用groupby apply生成Series时保留DataFrame的索引

2 个答案: