Question

我有一个数据框，如下所示。

ID  Ownwer_ID   Building   Nationality  Age   Sector
1   2           Villa      India        24    SE1
2   2           Villa      India        28    SE1
3   4           Apartment  USA          82    SE2
4   4           Apartment  USA          68    SE2
5   7           Villa      UK           32    SE2
6   7           Villa      UK           28    SE2
7   7           Villa      UK            4    SE2
8   8           LabourCamp Pakistan     27    SE3
9   2           Villa      India        1     SE1
10  10          LabourCamp India        23    SE2
11  11          Apartment  Germany      34    SE3

上面的数据ID是唯一的，代表一个人。

在上面的数据框中，我想在下面的数据框中进行准备

Sector   #Age_0-12  #Agemore70   #Asians  #Europe  #USA  #Asians_LabourCamp #USA_Apartment
SE1      1          0            3        0        0     0                  0
SE2      1          1            1        3        2     1                  2
SE3      0          0            1        1        0     1                  0

我认为亚洲人是印度或巴基斯坦国籍。欧洲=国籍英国或德国。

＃Age_0-12 =年龄在0到12（含）之间的人数

＃Agemore70 =年龄大于或等于70岁的人数

类似地，其余所有列都是用姓名解释的人数。

我尝试了以下代码

d = {'India': 'Asians', 'Pakistan': 'Asians', 'UK': 'Europe', 'Germany': 'Europe',
'USA': 'USA'}
df['natinality_Group'] = df['Nationality'].map(d)

bins = [-1, , 12, , 21, 50, 100]
df['binned_age'] = pd.cut(df['Age'], bins)

此后，我一无所知，如果您有解决方案，可以帮助我吗？

Answer 1

让我们尝试一下，使用pd.cut获取年龄组，使用pd.get_dummies与groupby获取所选列中每个值的计数：

df['Age Group'] = pd.cut(df['Age'],[0,12,70,np.inf],labels=['Age_0-12','Age_12-70','Agemore70'])


df_out = pd.get_dummies(df[['Sector','Building', 'Age Group', 'Nationality']], 
                        columns=['Age Group', 'Building', 'Nationality'], 
                        prefix='#', prefix_sep='').groupby('Sector').sum()

输出：

       #Age_0-12  #Age_12-70  #Agemore70  #Apartment  #LabourCamp  #Villa  \
Sector                                                                       
SE1             1           2           0           0            0       3   
SE2             1           4           1           2            1       3   
SE3             0           2           0           1            1       0   

        #Germany  #India  #Pakistan  #UK  #USA  
Sector                                          
SE1            0       3          0    0     0  
SE2            0       1          0    3     2  
SE3            1       0          1    0     0

Groupby依靠多个条件和多个列熊猫

1 个答案: