Pandas:使用value_counts创建数据框

时间:2016-07-28 10:51:04

标签: python pandas

我有数据

solve

我需要像这样得到smth image 我可以得到

age 32 16 39 39 23 36 29 26 43 34 35 50 29 29 31 42 53

df.age.value_counts()

但是我如何结合这个并为列命名呢?

1 个答案:

答案 0 :(得分:1)

您可以cut使用agg

#helper df with min and max ages, necessary add category Total
df1 = pd.DataFrame({'G':['14 yo and younger','15-19','20-24','25-29','30-34',
                         '35-39','40-44','45-49','50-54','55-59','60-64','65+','Total'], 
                     'Min':[0, 15,20,25,30,35,40,45,50,55,60,65,np.nan], 
                     'Max':[14,19,24,29,34,39,44,49,54,59,64,120, np.nan]})

print (df1)
                    G    Max   Min
0   14 yo and younger   14.0   0.0
1               15-19   19.0  15.0
2               20-24   24.0  20.0
3               25-29   29.0  25.0
4               30-34   34.0  30.0
5               35-39   39.0  35.0
6               40-44   44.0  40.0
7               45-49   49.0  45.0
8               50-54   54.0  50.0
9               55-59   59.0  55.0
10              60-64   64.0  60.0
11                65+  120.0  65.0
12              Total    NaN   NaN
cutoff = np.hstack([np.array(df1.Min[0]), df1.Max.values])
labels = df1.G.values

df['Groups'] = pd.cut(df.age, bins=cutoff, labels=labels, right=True, include_lowest=True)
print (df)
    age Groups
0    32  30-34
1    16  15-19
2    39  35-39
3    39  35-39
4    23  20-24
5    36  35-39
6    29  25-29
7    26  25-29
8    43  40-44
9    34  30-34
10   35  35-39
11   50  50-54
12   29  25-29
13   29  25-29
14   31  30-34
15   42  40-44
16   53  50-54
df = df.groupby('Groups')['Groups']
       .agg({'Total':[len, lambda x: len(x)/df.shape[0] * 100 ]})
       .rename(columns={'len':'N', '<lambda>':'%'})

#last Total row
df.ix['Total'] = df.sum()

print (df)    
                 Total            
                      N           %
Groups                             
14 yo and younger   0.0    0.000000
15-19               1.0    5.882353
20-24               1.0    5.882353
25-29               4.0   23.529412
30-34               3.0   17.647059
35-39               4.0   23.529412
40-44               2.0   11.764706
45-49               0.0    0.000000
50-54               2.0   11.764706
55-59               0.0    0.000000
60-64               0.0    0.000000
65+                 0.0    0.000000
Total              17.0  100.000000

EDIT1:

size更好的解决方案:

df1 = df.groupby('Groups').size().to_frame()
df1.columns = pd.MultiIndex.from_arrays(('Total','N'))
df1.ix[:,('Total','%')] = 100 * df1.ix[:,('Total','N')] / df.shape[0]
df1.ix['Total'] = df1.sum()
print (df1)
                  Total            
                      N           %
Groups                             
14 yo and younger   0.0    0.000000
15-19               1.0    5.882353
20-24               1.0    5.882353
25-29               4.0   23.529412
30-34               3.0   17.647059
35-39               4.0   23.529412
40-44               2.0   11.764706
45-49               0.0    0.000000
50-54               2.0   11.764706
55-59               0.0    0.000000
60-64               0.0    0.000000
65+                 0.0    0.000000
Total              17.0  100.000000