Les说我的意见是:
doc_id label
0 a Apple
1 b Book
2 c Book
3 a Book
4 b Cat
5 a Apple
6 c Book
我的数据^^
df = pd.DataFrame({"doc_id": ["a", "b", "c", "a", "b", "a", "c"],
"label": ["Apple", "Book", "Book", "Book", "Cat", "Apple", "Book"]
})
我想要的输出是:
label Apple Book Cat
doc_id
a 2.0 1.0 NaN
b NaN 1.0 1.0
c NaN 2.0 NaN
我能得到的:
df["count"] = np.ones(len(df))
new_df = df.pivot_table(index="doc_id", columns="label", values="count", aggfunc="sum")
但是创建一个临时的计数列都是多余的,这样做的正确方法是什么?