应用于列和数据帧时,pandas nsmallest方法的差异

时间:2017-10-04 13:46:13

标签: python pandas dataframe

我有以下df:

         I_q_0_sub  I_q_1_sub   I_q_2_sub   I_q_3_sub   I_q_4_sub   I_q_5_sub
q
0.016513    1.0     1.086586    0.396789    0.030419    0.167913    0.626752
0.017082    1.0     1.088389    0.397858    0.029408    0.166246    0.629824
0.017651    1.0     1.088213    0.398661    0.028985    0.167011    0.628466
0.018221    1.0     1.085454    0.396699    0.027980    0.165895    0.627416
0.018790    1.0     1.078595    0.395192    0.026815    0.165361    0.625276
0.019360    1.0     1.076327    0.393727    0.026964    0.166564    0.624980
0.019929    1.0     1.076141    0.392881    0.026499    0.166089    0.624884
0.020498    1.0     1.074617    0.391246    0.026023    0.164293    0.625018
0.021068    1.0     1.074573    0.389804    0.025534    0.165650    0.623080
0.021637    1.0     1.074772    0.390498    0.025619    0.166404    0.622398
0.022207    1.0     1.072407    0.389034    0.025418    0.165267    0.620503
0.022776    1.0     1.068778    0.389453    0.025364    0.165631    0.621173
0.023345    1.0     1.069125    0.388866    0.025222    0.165374    0.622733
0.023915    1.0     1.067703    0.389035    0.024862    0.164636    0.621182
0.024484    1.0     1.063856    0.387513    0.025124    0.164992    0.619048
0.025054    1.0     1.063000    0.388187    0.025446    0.164981    0.617603
0.025623    1.0     1.063995    0.387414    0.025752    0.165825    0.617720
0.026192    1.0     1.061866    0.387479    0.025579    0.165128    0.618729
0.026762    1.0     1.060178    0.384343    0.025603    0.165227    0.616478
0.027331    1.0     1.057169    0.384075    0.025644    0.164989    0.617416
0.027900    1.0     1.054479    0.384566    0.026249    0.164863    0.615285
0.028470    1.0     1.054914    0.383443    0.026397    0.166146    0.616100
0.029039    1.0     1.054963    0.383084    0.026302    0.165473    0.617631
0.029609    1.0     1.052284    0.382753    0.026824    0.164973    0.614430
0.030178    1.0     1.053644    0.383991    0.027040    0.166437    0.615252
0.030747    1.0     1.051703    0.384502    0.027135    0.166372    0.614781
0.031317    1.0     1.048446    0.383240    0.027762    0.165991    0.614492
0.031886    1.0     1.050411    0.382216    0.027915    0.167335    0.613784
0.032455    1.0     1.052862    0.383122    0.028400    0.167722    0.615104
0.033025    1.0     1.048664    0.384156    0.029077    0.167987    0.614716
0.033594    1.0     1.045783    0.384269    0.029518    0.166930    0.614234
0.034163    1.0     1.049077    0.384258    0.030929    0.168138    0.614413
0.034733    1.0     1.047248    0.384060    0.031300    0.168228    0.613657
0.035302    1.0     1.044294    0.385330    0.031312    0.168637    0.612413
0.035872    1.0     1.045500    0.384630    0.031975    0.169539    0.613903
0.036441    1.0     1.047008    0.385461    0.032721    0.169195    0.614401
0.037010    1.0     1.046601    0.386015    0.033526    0.171218    0.615378
0.037580    1.0     1.039578    0.385855    0.034593    0.170812    0.611779
0.038149    1.0     1.042296    0.386241    0.035050    0.170443    0.611111
0.038718    1.0     1.041640    0.385285    0.035902    0.171739    0.611083
0.039288    1.0     1.046594    0.388858    0.037150    0.174225    0.613183
0.039857    1.0     1.045652    0.390708    0.038682    0.173627    0.613125
0.040426    1.0     1.046337    0.392301    0.039181    0.174176    0.612989
0.040996    1.0     1.041239    0.392167    0.039861    0.175146    0.612321
0.041565    1.0     1.040595    0.393418    0.040991    0.174704    0.613320

当我为各个列和整个df应用nsmallest方法时,我得到了不同的值。正如您所看到的,对于各个列,我得到了这个:

df["I_q_0_sub"].iloc[15:60].nsmallest(n = 5).mean()
1.0

df["I_q_1_sub"].iloc[15:60].nsmallest(n = 5).mean()
1.041069402080646

df["I_q_2_sub"].iloc[15:60].nsmallest(n = 5).mean()
0.3828830431385227

df["I_q_3_sub"].iloc[15:60].nsmallest(n = 5).mean()
0.025197931800085817

df["I_q_4_sub"].iloc[15:60].nsmallest(n = 5).mean()
0.16474921466342365

df["I_q_4_sub"].iloc[15:60].nsmallest(n = 5).mean()
0.61174148613757

整个df:

df.iloc[15:60].nsmallest(n = 5, columns=formFactor_sub.columns).mean()
I_q_0_sub    1.000000
I_q_1_sub    1.041069
I_q_2_sub    0.388593
I_q_3_sub    0.037279
I_q_4_sub    0.172569
I_q_5_sub    0.611923

因此,各列的值与整个数据帧的值不同,但不可以。任何关于为什么会发生这种情况的提示都会受到赞赏。

1 个答案:

答案 0 :(得分:0)

nsmallest仅适用于column参数,即必须找到特定列的最小行。如果您使用columns = formFactor_sub.columns完成所有列的传递,那么它所做的就是它返回第一列的nsmallest值。

考虑这个数据框,

[0]

返回

    a   b   c
0   1   7   1
1   10  9   2
2   8   3   3
3   11  6   4
4   -1  1   5
5   5   8   6
6   15  2   7

df.nsmallest(3, columns = df.columns)

相同
    a   b   c
4   -1  1   5
0   1   7   1
5   5   8   6

但它与你得到的不同

df.nsmallest(3, columns = 'a')

df['b'].nsmallest(3)