我的行看起来像这样
zipcode room_type
2011 bed
2012 sofa
每个列表都会显示一个airBNB列表。我想聚合数据,以便计算所有唯一值。每个唯一值都有自己的列,数据按邮政编码分组。所以结果看起来像这样:
zipcode bed sofa ground
1011 200 36 20
1012 720 45 89
如何通过pandas获得此结果?
答案 0 :(得分:1)
我使用索引和重塑完成了这个:
df = DataFrame({'zipcode':[20110,20110,20111,20111,20111], 'room_type': ['bed','sofa', 'bed','bed','sofa']})
df.set_index(['zipcode', 'room_type'], inplace=True)
df
zipcode room_type
20110 bed
sofa
20111 bed
bed
sofa
# count the values and generate a new dataframe
df2 = DataFrame(df.index.value_counts(), columns=['count'])
df2.reset_index(inplace=True)
df2
index count
0 (20111, bed) 2
1 (20110, bed) 1
2 (20111, sofa) 1
3 (20110, sofa) 1
# split the tuple into new columns
df2[['zipcode', 'room_type']] = df2['index'].apply(Series)
df2.drop('index', axis=1, inplace=True)
# reshape
df2.pivot(index='zipcode', columns='room_type', values='count')
room_type bed sofa
zipcode
20110 1 1
20111 2 1
答案 1 :(得分:0)
首先将groupby与“zipcode”和“room_type”列一起使用以获得相应的计数
In [4]: df = df.groupby(['zipcode','room_type'])['room_type'].agg(['count']).reset_index()
In [5]: df
Out[5]:
zipcode room_type count
0 20110 bed 1
1 20110 sofa 1
2 20111 bed 2
3 20111 sofa 1
现在使用'pivot_table'来获得所需的结果
In [6]: df = df.pivot_table(values='count', columns='room_type', index='zipcode')
In [7]: df
Out[7]:
room_type bed sofa
zipcode
20110 1 1
20111 2 1
删除列名称
In [8]: df.columns.name = None
In [9]: df
Out[9]:
bed sofa
zipcode
20110 1 1
20111 2 1
最终重置指数
In [10]: df = df.reset_index()
In [11]: df
Out[11]:
zipcode bed sofa
0 20110 1 1
1 20111 2 1
答案 2 :(得分:0)
我觉得易于实现的交叉表方式
pd.crosstab(df.zipcode,df.room_type).reset_index()
会做