我有一个如下数据框:
df_detail =
car_brand car_type
0 Toyota Sedan
1 Toyota Truck
2 Honda Truck
3 Mazda Sedan
4 Mazda Convertible
我想创建一个如下所示的摘要数据框:
df_summary=
ID car_brand count_Sedan count_Truck count_Convertible
0 Toyota 1 1 0
1 Honda 0 1 0
2 Mazda 1 0 1
有没有一种方法可以使用熊猫创建count_列? 我尝试了以下方法:
import pandas as pd
d = {'car_brand':['Toyota','Toyota','Honda','Mazda','Mazda'],'car_type':['Sedan','Truck','Truck','Sedan','Convertible']}
df_detail = pd.DataFrame(data=d)
df_summary = pd.DataFrame({'car_brand':[]})
df_summary['car_brand'] = df_detail['car_brand'].unique()
df_summary['count_Sedan']=df_detail[((df_detail['car_brand']==df_summary['car_brand']) &
(df_detail['car_type']=='Sedan'))].count()
我得到了错误:
ValueError: Can only compare identically-labeled Series objects
答案:
df_detail.set_index('car_brand')['car_type'].str.get_dummies().sum(level=0).add_prefix('count_').reset_index()
答案 0 :(得分:2)
尝试使用.str.get_dummies
:
df_detail.set_index('car_brand')['car_type'].str.get_dummies()\
.sum(level=0).add_prefix('count_')
输出:
count_Convertible count_Sedan count_Truck
car_brand
Toyota 0 1 1
Honda 0 0 1
Mazda 1 1 0
并添加.reset_index以获取整数索引:
df_detail.set_index('car_brand')['car_type'].str.get_dummies()\
.sum(level=0).add_prefix('count_').reset_index()
输出:
car_brand count_Convertible count_Sedan count_Truck
0 Toyota 0 1 1
1 Honda 0 0 1
2 Mazda 1 1 0
答案 1 :(得分:0)
df_summary = (
df_detail.groupby(['car_brand', 'car_type']).size()
.unstack(fill_value=0)
)