为什么pandas groupby使用get_letter_type函数,'辅音'小组不包括' B'柱?

时间:2016-05-20 17:46:21

标签: python pandas

在pandas文档(http://pandas.pydata.org/pandas-docs/stable/groupby.html)上,使用groupby和下面的get_letter_type函数的示例。为什么描述的结果不包括列' B'?

In [5]: def get_letter_type(letter):
   ...:     if letter.lower() in 'aeiou':
   ...:         return 'vowel'
   ...:     else:
   ...:         return 'consonant'
   ...: 
In [6]: grouped = df.groupby(get_letter_type, axis=1)
In [7]: grouped.describe()

结果显示here,没有B列。谁有人解释为什么?因为在我看来,B应该属于' consnant'小组,有什么我错过的吗?

1 个答案:

答案 0 :(得分:1)

对我而言,如果DataFrame只有AB列,则可以使用

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar',
                          'foo', 'bar', 'foo', 'foo'],
                   'B' : ['one', 'one', 'two', 'three',
                          'two', 'two', 'one', 'three']})

def get_letter_type(letter):
    if letter.lower() in 'aeiou':
        return 'vowel'
    else:
        return 'consonant'


grouped = df.groupby(get_letter_type, axis=1)
for i, g in (grouped):
    print (i)
    print (g)

consonant
       B
0    one
1    one
2    two
3  three
4    two
5    two
6    one
7  three

vowel
     A
0  foo
1  bar
2  foo
3  bar
4  foo
5  bar
6  foo
7  foo    

print (grouped.describe())    
       consonant vowel
               B     A
count          8     8
unique         3     2
top          one   foo
freq           3     5

我认为有automatic exclusion of nuisance columns,如果是某个群组,例如consonant包含numericstring列:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar',
                          'foo', 'bar', 'foo', 'foo'],
                   'B' : ['one', 'one', 'two', 'three',
                          'two', 'two', 'one', 'three'],
                   'C' : np.random.randn(8),
                   'D' : np.random.randn(8)})

def get_letter_type(letter):
    if letter.lower() in 'aeiou':
        return 'vowel'
    else:
        return 'consonant'


grouped = df.groupby(get_letter_type, axis=1)
    
for i, g in (grouped):
    print (i)
    print (g)
    consonant
       B         C         D
0    one  0.322759  0.348806
1    one -0.122110 -1.566801
2    two  1.846408 -0.830144
3  three -0.509248  0.486773
4    two -1.061608 -0.069366
5    two  1.083728  0.429543
6    one -0.664480 -0.702906
7  three  0.587159  0.978647
vowel
     A
0  foo
1  bar
2  foo
3  bar
4  foo
5  bar
6  foo
7  foo

print (grouped.describe())    
       consonant           vowel
               C         D     A
25%    -0.548056 -0.734716   NaN
50%     0.100325  0.139720   NaN
75%     0.711301  0.443851   NaN
count   8.000000  8.000000     8
freq         NaN       NaN     5
max     1.846408  0.978647   NaN
mean    0.185326 -0.115681   NaN
min    -1.061608 -1.566801   NaN
std     0.971055  0.848251   NaN
top          NaN       NaN   foo
unique       NaN       NaN     2