在pandas文档(http://pandas.pydata.org/pandas-docs/stable/groupby.html)上,使用groupby和下面的get_letter_type函数的示例。为什么描述的结果不包括列' B'?
In [5]: def get_letter_type(letter):
...: if letter.lower() in 'aeiou':
...: return 'vowel'
...: else:
...: return 'consonant'
...:
In [6]: grouped = df.groupby(get_letter_type, axis=1)
In [7]: grouped.describe()
结果显示here,没有B列。谁有人解释为什么?因为在我看来,B应该属于' consnant'小组,有什么我错过的吗?
答案 0 :(得分:1)
对我而言,如果DataFrame
只有A
和B
列,则可以使用
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar',
'foo', 'bar', 'foo', 'foo'],
'B' : ['one', 'one', 'two', 'three',
'two', 'two', 'one', 'three']})
def get_letter_type(letter):
if letter.lower() in 'aeiou':
return 'vowel'
else:
return 'consonant'
grouped = df.groupby(get_letter_type, axis=1)
for i, g in (grouped):
print (i)
print (g)
consonant
B
0 one
1 one
2 two
3 three
4 two
5 two
6 one
7 three
vowel
A
0 foo
1 bar
2 foo
3 bar
4 foo
5 bar
6 foo
7 foo
print (grouped.describe())
consonant vowel
B A
count 8 8
unique 3 2
top one foo
freq 3 5
我认为有automatic exclusion of nuisance columns,如果是某个群组,例如consonant
包含numeric
和string
列:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar',
'foo', 'bar', 'foo', 'foo'],
'B' : ['one', 'one', 'two', 'three',
'two', 'two', 'one', 'three'],
'C' : np.random.randn(8),
'D' : np.random.randn(8)})
def get_letter_type(letter):
if letter.lower() in 'aeiou':
return 'vowel'
else:
return 'consonant'
grouped = df.groupby(get_letter_type, axis=1)
for i, g in (grouped):
print (i)
print (g)
consonant
B C D
0 one 0.322759 0.348806
1 one -0.122110 -1.566801
2 two 1.846408 -0.830144
3 three -0.509248 0.486773
4 two -1.061608 -0.069366
5 two 1.083728 0.429543
6 one -0.664480 -0.702906
7 three 0.587159 0.978647
vowel
A
0 foo
1 bar
2 foo
3 bar
4 foo
5 bar
6 foo
7 foo
print (grouped.describe())
consonant vowel
C D A
25% -0.548056 -0.734716 NaN
50% 0.100325 0.139720 NaN
75% 0.711301 0.443851 NaN
count 8.000000 8.000000 8
freq NaN NaN 5
max 1.846408 0.978647 NaN
mean 0.185326 -0.115681 NaN
min -1.061608 -1.566801 NaN
std 0.971055 0.848251 NaN
top NaN NaN foo
unique NaN NaN 2