Pandas GroupBy,并用标准化计数替换值
样本DF:
df = pd.DataFrame(np.random.randint(0,20,size=(10,3)),columns=["c1","c2","c3"])
df["r1"]=["Apple","Mango","Apple","Mango","Mango","Mango","Apple","Mango","Apple","Apple"]
df["r2"]=["Orange","lemon","lemon","Orange","lemon","Orange","lemon","lemon","Orange","lemon"]
df["date"] = ["2002-01-01","2002-01-01","2002-01-01","2002-01-01","2002-01-01",
"2002-01-01","2002-02-01","2002-02-01","2002-02-01","2002-02-01"]
df["date"] = pd.to_datetime(df["date"])
df
DF:
c1 c2 c3 r1 r2 date
0 10 2 0 Apple Orange 2002-01-01
1 10 10 13 Mango lemon 2002-01-01
2 0 12 0 Apple lemon 2002-01-01
3 1 13 8 Mango Orange 2002-01-01
4 6 5 9 Mango lemon 2002-01-01
5 3 18 13 Mango Orange 2002-01-01
6 2 6 7 Apple lemon 2002-02-01
7 0 4 7 Mango lemon 2002-02-01
8 1 10 19 Apple Orange 2002-02-01
9 11 18 2 Apple lemon 2002-02-01
我正在尝试按date
列分组,并用归一化计数替换选定的列。
例如:
组2002-01-01
中的列r1
值Apple
将被0.3
替换,因为在该组中有6
条记录和{{1} }记录有2
,因此Apple
和2/6
将被Mango
替换为4/6
熊猫解决方案:
0.6
错误:
df.groupby("date")[["r1","r2"]].apply(lambda x: x.map(x.value_counts()))
是否有熊猫方法来代替重复的AttributeError: 'DataFrame' object has no attribute 'map'
解决方案。
答案 0 :(得分:3)
我们可以做value_counts
+ normalize
df['New']=df.groupby(['date']).r1.value_counts(normalize=True).reindex(pd.MultiIndex.from_frame(df[['date','r1']])).values
df
c1 c2 c3 r1 r2 date New
0 1 8 2 Apple Orange 2002-01-01 0.333333
1 8 1 7 Mango lemon 2002-01-01 0.666667
2 0 14 8 Apple lemon 2002-01-01 0.333333
3 11 13 10 Mango Orange 2002-01-01 0.666667
4 15 4 15 Mango lemon 2002-01-01 0.666667
5 13 7 7 Mango Orange 2002-01-01 0.666667
6 7 0 14 Apple lemon 2002-02-01 0.750000
7 13 5 11 Mango lemon 2002-02-01 0.250000
8 19 17 11 Apple Orange 2002-02-01 0.750000
9 8 1 9 Apple lemon 2002-02-01 0.750000
答案 1 :(得分:2)
您可以使用transform
方法获取每个组的大小,并将此值分配给原始数据帧的每一行。
In [11]: df.groupby(['date', 'r1'])['c1'].transform(len)/df.groupby(['date'])['c1'].transform(len)
Out[11]:
0 0.333333
1 0.666667
2 0.333333
3 0.666667
4 0.666667
5 0.666667
6 0.750000
7 0.250000
8 0.750000
9 0.750000
Name: c1, dtype: float64
如果需要获取舍入值,只需使用round
方法。