考虑df
import pandas as pd
import numpy as np
np.random.seed([3,1415])
df = pd.DataFrame(np.random.randn(100, 10), columns=list('ABCDEFGHIJ'))
describe
计算有用的统计数据
df.describe()
介绍NaN
现在考虑d1
d1 = df.mask(np.random.choice([True, False], df.shape, p=[.2, .8]))
d1.describe()
我没有['25%', '50%', '75%']
的计算
如何使用预先存在的功能方便地获取这些功能?
答案 0 :(得分:4)
更清洁的方法是使用include参数,例如:
d1.describe(include=['float64'])
Out[214]:
A B C D E F G H I J
count 70.0000 77.0000 81.0000 82.0000 78.0000 81.0000 80.0000 82.0000 75.0000 81.0000
mean 0.0572 -0.1383 -0.1550 -0.0658 0.0074 -0.0508 -0.0253 -0.0202 -0.1054 0.1019
std 0.9580 0.9447 1.0263 0.9393 0.8976 0.9207 0.9993 0.9474 1.0305 0.7382
min -2.3045 -2.3190 -2.2027 -2.8470 -2.7149 -2.4345 -2.3619 -2.0283 -2.1609 -1.6739
25% -0.5287 -0.6854 -0.9155 -0.8202 -0.5456 -0.6045 -0.6823 -0.6192 -0.9222 -0.3186
50% 0.0581 -0.2999 -0.1799 -0.0525 0.0181 -0.1502 -0.1421 -0.0458 -0.0108 0.1053
75% 0.5510 0.4997 0.5064 0.7505 0.5904 0.5217 0.6515 0.5790 0.6261 0.7041
max 2.6967 2.3198 2.5974 1.8385 2.2225 2.6081 2.4215 2.0045 2.1077 1.9469
您也可以使用exclude
参数,但NaN值很棘手。通过' bool'作品
d1.describe(exclude=['bool'])
Out[221]:
A B C D E F G H I J
count 70.0000 77.0000 81.0000 82.0000 78.0000 81.0000 80.0000 82.0000 75.0000 81.0000
mean 0.0572 -0.1383 -0.1550 -0.0658 0.0074 -0.0508 -0.0253 -0.0202 -0.1054 0.1019
std 0.9580 0.9447 1.0263 0.9393 0.8976 0.9207 0.9993 0.9474 1.0305 0.7382
min -2.3045 -2.3190 -2.2027 -2.8470 -2.7149 -2.4345 -2.3619 -2.0283 -2.1609 -1.6739
25% -0.5287 -0.6854 -0.9155 -0.8202 -0.5456 -0.6045 -0.6823 -0.6192 -0.9222 -0.3186
50% 0.0581 -0.2999 -0.1799 -0.0525 0.0181 -0.1502 -0.1421 -0.0458 -0.0108 0.1053
75% 0.5510 0.4997 0.5064 0.7505 0.5904 0.5217 0.6515 0.5790 0.6261 0.7041
max 2.6967 2.3198 2.5974 1.8385 2.2225 2.6081 2.4215 2.0045 2.1077 1.9469
答案 1 :(得分:1)