假设我有一个如下所示的熊猫数据框:
df = pd.DataFrame()
df["person"] = ["p1", "p2", "p1", "p3", "p3", "p2", "p2", "p1", "p3", "p1",
"p1", "p2", "p2", "p1", "p3", ]
df["type"] = ["a", "a", "a", "a", "b", "a", "a", "b", "b", "b", "a", "a",
"b", "a", "b",]
df["value"] = np.random.random(15)
bins = [0, 0.25,0.5,0.75, 1]
labels = [f"{float(i)}-{float(j)}" for i, j in zip(bins[:-1], bins[1:])]
df["bin"] = pd.cut(df["value"], bins=bins, labels=labels, right = False)
我想插入一个新列,该列返回按“类型”分组的“人”的计数。通过浏览SO,我发现以下代码行有效,但前提是我不包括最后一个列“ bin”。我的问题是如何在还包括列“ bin”的数据框中插入“计数器”列。预先谢谢你!
df["counter"] = df.groupby(["person", "type"], as_index = False).transform("count")
答案 0 :(得分:1)
只需将其更改为
df["counter"] = df.groupby(["person", "type"], as_index = False)['value'].transform("count")
你会得到
person type value bin counter
0 p1 a 0.134629 0.0-0.25 4
1 p2 a 0.997557 0.75-1.0 4
2 p1 a 0.911967 0.75-1.0 4
3 p3 a 0.278438 0.25-0.5 1
4 p3 b 0.539296 0.5-0.75 3
5 p2 a 0.722150 0.5-0.75 4
6 p2 a 0.724028 0.5-0.75 4
7 p1 b 0.989627 0.75-1.0 2
8 p3 b 0.978790 0.75-1.0 3
9 p1 b 0.197428 0.0-0.25 2
10 p1 a 0.330113 0.25-0.5 4
11 p2 a 0.806856 0.75-1.0 4
12 p2 b 0.430026 0.25-0.5 1
13 p1 a 0.265003 0.25-0.5 4
14 p3 b 0.037202 0.0-0.25 3