我有一个数据框,结构如下:
ID | Name | Role
1 | John | Owner
1 | Bob | Driver
2 | Jake | Owner
2 | Tom | Driver
2 | Sally | Owner
3 | Mary | Owner
3 | Sue | Driver
我想转动Role列并将Name列作为值,但由于某些ID(在这种情况下为索引)在owner角色中有多个人,而某些不具有pivot_table函数不行。有没有办法为特定ID可能拥有的每个其他所有者创建新列。有些人可能有2,3,4+所有者。谢谢!
以下示例输出:
ID | Owner_1 | Owner_2 | Driver
1 | John | NaN | Bob
2 | Jake | Sally | Tom
3 | Mary | NaN | Sue
这就是我的尝试:
pd.pivot_table(df,values='Name',index='ID',columns='Role')
DataError: No numeric types to aggregate
答案 0 :(得分:2)
您可以使用cumcount
为每个ID中的重复项创建附加键,然后我们只需使用pivot
df.Role=df.Role+'_'+df.groupby(['ID','Role']).cumcount().add(1).astype(str)
df.pivot('ID','Role','Name')
Out[432]:
Role Driver_1 Owner_1 Owner_2
ID
1 Bob John None
2 Tom Jake Sally
3 Sue Mary None
答案 1 :(得分:0)
您需要将默认聚合函数从mean
更改为sum
:
pivoted = pd.pivot_table(df, values='Name',
index='ID', columns='Role', aggfunc='sum')
#Role Driver Owner
#ID
#1 Bob John
#2 Tom Jake Sally
#3 Sue Mary
现在,一些所有者被表示为多字符串。将它们分成单个词:
result = pivoted.join(pivoted['Owner'].str.split().apply(pd.Series))\
.drop("Owner", axis=1)
# Driver 0 1
#ID
#1 Bob John NaN
#2 Tom Jake Sally
#3 Sue Mary NaN
result.columns = "Driver", "Owner_1", "Owner_2"