我有以下 DataFrame 表示用户是否在某个星期、某个年份出现:
User Year Week
0 John 2020 1
1 John 2020 2
2 Steve 2020 1
3 Fred 2020 3
4 George 2020 2
5 George 2020 3
... ... ...
200 John 2021 2
201 John 2021 4
202 Steve 2021 2
203 Fred 2021 2
204 George 2021 1
205 George 2021 4
我想得到一个 DataFrame,它按 User
对数据集进行分组,每一列代表他是否在某一年的某一周出现,每一列要么是布尔类型要么是整数,可能的值为 0 或1.
它看起来像这样:
2020_1 2020_2 2020_3 ... 2021_1 2021_2 2021_3 2021_4
John 1 1 0 ... 0 1 0 1
Steve 1 0 0 ... 0 1 0 0
Fred 0 0 1 ... 0 1 0 0
George 0 1 1 ... 1 0 0 1
有没有办法在不遍历 DataFrme 的情况下做到这一点?
谢谢。
答案 0 :(得分:3)
创建一个新列并使用 pd.crosstab
:
pd.crosstab(df['User'],
df[['Year','Week']].astype(str).apply('_'.join, axis=1)
)
输出:
col_0 2020_1 2020_2 2020_3 2021_1 2021_2 2021_4
User
Fred 0 0 1 0 1 0
George 0 1 1 1 0 1
John 1 1 0 0 1 1
Steve 1 0 0 0 1 0
答案 1 :(得分:3)
这里有一种方法可以做到这一点:
import pandas as pd
df = pd.DataFrame({
"User" : ["John","John","Steve","Fred","George","George"],
"Year" : [2020,2020,2020,2020,2020,2020],
"Week": [1,2,1,3,2,3]})
# add a helper column for year_week
df["year_week"] = df["Year"].map(str) + "_" + df["Week"].map(str)
# group by User and year_week, then unstack and fill NaN with 0
df.groupby(["User","year_week"]).size().unstack(fill_value = 0)
结果:
| User | 2020_1 | 2020_2 | 2020_3 |
|:-------|---------:|---------:|---------:|
| Fred | 0 | 0 | 1 |
| George | 0 | 1 | 1 |
| John | 1 | 1 | 0 |
| Steve | 1 | 0 | 0 |
答案 2 :(得分:2)
pd.crosstab(df.User, df['Year'].astype(str)+"_"+df['Week'].astype(str))