我有以下数据框table_1
:
Sample method value
3 sample1 method_0 1
3 sample1 method_1 2
3 sample1 method_2 3
3 sample1 method_3 4
3 sample2 method_0 5
3 sample2 method_1 6
3 sample2 method_2 7
3 sample2 method_3 8
grouped = table_1.groupby('method')
我想按'方法'进行分组,然后对每个组,将值分为'值'该组的列由另一个系列组成,其条目数与每个组中的条目数相同。我通过以下方式实现了这一目标:
table_2 = grouped.apply(lambda x: x['value'].div(series_of_two_elements))
但现在我想将table_2
合并到table_1
中的每个组中。当我尝试:
table_1['normalized'] = table_2
我明白了:
TypeError: 'DataFrameGroupBy' object does not support item assignment
如何将table_1
转换回DataFrame,以便为每个组分配这些新的规范化值?我可以使用df.transform
的lambda表达式吗?
答案 0 :(得分:2)
我认为需要GroupBy.transform
和Series
为numpy数组添加.values
以避免对齐:
series_of_two_elements = pd.Series([1,2])
grouped = table_1.groupby('method')
table_2 = grouped['value'].transform(lambda x: x.div(series_of_two_elements.values))
table_1['normalized'] = table_2
print (table_1)
Sample method value normalized
3 sample1 method_0 1 1.0
3 sample1 method_1 2 2.0
3 sample1 method_2 3 3.0
3 sample1 method_3 4 4.0
3 sample2 method_0 5 2.5
3 sample2 method_1 6 3.0
3 sample2 method_2 7 3.5
3 sample2 method_3 8 4.0
另一种可能的解决方案是创建MultiIndex
第二级cumcount
,然后使用div
第二级Series
名为series_of_two_elements
必须同每个组的索引值如second level
):
series_of_two_elements = pd.Series([1,2])
table_1 = table_1.set_index(['method', table_1.groupby('method').cumcount()])
table_1['normalized'] = table_1['value'].div(series_of_two_elements, level=1)
print (table_1)
Sample value normalized
method
method_0 0 sample1 1 1.0
method_1 0 sample1 2 2.0
method_2 0 sample1 3 3.0
method_3 0 sample1 4 4.0
method_0 1 sample2 5 2.5
method_1 1 sample2 6 3.0
method_2 1 sample2 7 3.5
method_3 1 sample2 8 4.0