我有一个数据框,看起来像
import pandas as pd
data = [
{
"userId": 1,
"binary_vote": 0,
"genres": [
"Adventure",
"Comedy"
]
},
{
"userId": 1,
"binary_vote": 1,
"genres": [
"Adventure",
"Drama"
]
},
{
"userId": 2,
"binary_vote": 0,
"genres": [
"Comedy",
"Drama"
]
},
{
"userId": 2,
"binary_vote": 1,
"genres": [
"Adventure",
"Drama"
]
},
]
df = pd.DataFrame(data)
print(df)
userId binary_vote genres
0 1 0 [Adventure, Comedy]
1 1 1 [Adventure, Drama]
2 2 0 [Comedy, Drama]
3 2 1 [Adventure, Drama]
我想从binary_vote
创建一列。这是预期的输出,
userId binary_vote_0 binary_vote_1
0 1 [Adventure, Comedy] [Adventure, Drama]
1 2 [Comedy, Drama] [Adventure, Drama]
我尝试过类似的操作,但出现错误
pd.pivot_table(df, columns=['binary_vote'], values='genres')
这是错误,
DataError:没有要聚合的数字类型
有什么主意吗?预先感谢。
答案 0 :(得分:3)
我们必须创建自己的aggfunc
,在这种情况下,这很简单。
失败的原因是因为它试图采用mean
,因为它是默认的聚合函数。显然,这将在您的列表上失败。
piv = (
df.pivot_table(index='userId', columns='binary_vote', values='genres', aggfunc=lambda x: x)
.add_prefix('binary_vote_')
.reset_index()
.rename_axis(None, axis=1)
)
print(piv)
userId binary_vote_0 binary_vote_1
0 1 [Adventure, Comedy] [Adventure, Drama]
1 2 [Comedy, Drama] [Adventure, Drama]
答案 1 :(得分:1)
使用set_index()
和unstack()
的另一种方法:
m=(df.set_index(['userId','binary_vote']).unstack()
.add_prefix('binary_vote_').droplevel(level=0,axis=1))
m.reset_index().rename_axis(None,axis=1)
userId binary_vote_0 binary_vote_1
0 1 [Adventure, Comedy] [Adventure, Drama]
1 2 [Comedy, Drama] [Adventure, Drama]