循环使用pandas数据框将公式应用于每个值

时间:2016-09-26 14:03:34

标签: python loops pandas dataframe

我有以下pandas数据框:

            PC1     PC2     PC3     PC4     PC5     PC6     PC7
ind                                                                       
NA06984 -0.0082 -0.0594 -0.0148 -0.0569 -0.1128 -0.0276 -0.0217 
NA06986 -0.0131 -0.0659 -0.0426  0.0654  0.0473  0.0603 -0.0454  
NA06989 -0.0073 -0.0551 -0.0457  0.0971 -0.0051 -0.0123  0.0035  
NA06994 -0.0051 -0.0599 -0.0239  0.0930  0.0765  0.0321  0.0392  
NA07000 -0.0046 -0.0362  0.0006 -0.0639 -0.0197 -0.0132  0.0631  
NA07037 -0.0132 -0.0600 -0.0252 -0.0381 -0.0091  0.0005  0.0235  
NA07048 -0.0128 -0.0653 -0.0234 -0.0417  0.0233  0.1034  0.0180  
NA07051 -0.0028 -0.0591 -0.0117 -0.0791 -0.0387  0.0102 -0.0840  
NA07056 -0.0121 -0.0389  0.0113 -0.0754  0.0226 -0.0304 -0.0490  
NA07347 -0.0192 -0.0441 -0.0588  0.1099 -0.0414  0.0505  0.0295
NA07357 -0.0100 -0.0360 -0.0268 -0.0621 -0.0737 -0.0090  0.0379

我希望标准化每列的分布,即应用公式

column_i [row_j] - column_i.mean())/ column_i.std()

表示每列中的每个值,并用这些值替换原始数据框。

到目前为止,我已经提出了

for index, row in evec_pandas.iterrows():
    new_row = None
    evec_pandas.loc[index,'PC1'] = (row['PC1'] - evec_pandas['PC1'].mean()) / evec_pandas['PC1'].std()

 print evec_pandas

但结果是

              PC1     PC2     PC3     PC4     PC5     PC6     PC7
ind                                                                         
NA06984  0.343471 -0.0594 -0.0148 -0.0569 -0.1128 -0.0276 -0.0217  
NA06986 -0.330077 -0.0659 -0.0426  0.0654  0.0473  0.0603 -0.0454 
NA06989 -0.003975 -0.0551 -0.0457  0.0971 -0.0051 -0.0123  0.0035  
NA06994  0.008607 -0.0599 -0.0239  0.0930  0.0765  0.0321  0.0392  
NA07000  0.003659 -0.0362  0.0006 -0.0639 -0.0197 -0.0132  0.0631  
NA07037 -0.058300 -0.0600 -0.0252 -0.0381 -0.0091  0.0005  0.0235 
NA07048 -0.028319 -0.0653 -0.0234 -0.0417  0.0233  0.1034  0.0180  
NA07051  0.046818 -0.0591 -0.0117 -0.0791 -0.0387  0.0102 -0.0840  
NA07056 -0.043817 -0.0389  0.0113 -0.0754  0.0226 -0.0304 -0.0490   
NA07347 -0.071195 -0.0441 -0.0588  0.1099 -0.0414  0.0505  0.0295 
NA07357  0.019495 -0.0360 -0.0268 -0.0621 -0.0737 -0.0090  0.0379  

第一个值是正确的(0.343471),但PC1列中的其余值不是,当然其余列没有变化。如果我使用:

for index, row in evec_pandas.iterrows():
    new_row = None
    new_row = (row['PC1'] - evec_pandas['PC1'].mean()) / evec_pandas['PC1'].std()
    print new_row

我确实获得了PC1列,但作为一个独立的对象,不在数据框内:

0.343471311655
-0.673732188246
0.530304607555
0.987008219756
1.09080449526
-0.694491443346
-0.611454422946
1.46447108706
-0.466139637246
-1.94004674935
-0.0301952801455

所以我需要用这些值替换PC1,然后对每列做同样的事情;我曾想过像

这样的东西
for index, column in evec_pandas.iteritems():
    for index, row in evec_pandas.iterrows():
        new_row = None
        evec_pandas.loc[index,column] = (row[column] - evec_pandas[column].mean()) / evec_pandas[column].std()

但我知道它不会像这样工作。有什么想法吗?

所需的输出是:

                PC1        PC2          PC3        PC4        PC5        PC6         PC7
NA06984  0.34347131 -0.5760881  0.439607045 -0.6710009 -1.8594019 -1.0130591 -0.50633142
NA06986 -0.67373219 -1.1365003 -0.929352573  0.9013689  1.0906816  1.0794999 -1.02745500
NA06989  0.53030461 -0.2053539 -1.082006343  1.3089251  0.1251327 -0.6488253  0.04777466
NA06994  0.98700822 -0.6191967 -0.008505635  1.2562128  1.6287356  0.4081670  0.83275827
NA07000  1.09080450  1.4241525  1.197951582 -0.7609975 -0.1438943 -0.6702508  1.35827952
NA07037 -0.69449144 -0.6278185 -0.072521733 -0.4292956  0.0514267 -0.3441068  0.48754139
NA07048 -0.61145442 -1.0847700  0.016115941 -0.4755796  0.6484455  2.1055441  0.36660554
NA07051  1.46447109 -0.5502229  0.592260816 -0.9564188 -0.4939979 -0.1131873 -1.87620479
NA07056 -0.46613964  1.1913658  1.724853306 -0.9088491  0.6355469 -1.0797163 -1.10661301
NA07347 -1.94004675  0.7430361 -1.727091631  1.4734904 -0.5437494  0.8461998  0.61947141
NA07357 -0.03019528  1.4413959 -0.151310775 -0.7378555 -1.1389255 -0.5702651  0.80417343

1 个答案:

答案 0 :(得分:3)

您可以执行以下操作:

In [19]:
(df - df.mean())/df.std()

Out[19]:
              PC1       PC2       PC3       PC4       PC5       PC6       PC7
ind                                                                          
NA06984  0.343471 -0.576088  0.439607 -0.671001 -1.859402 -1.013059 -0.506331
NA06986 -0.673732 -1.136500 -0.929353  0.901369  1.090682  1.079500 -1.027455
NA06989  0.530305 -0.205354 -1.082006  1.308925  0.125133 -0.648825  0.047775
NA06994  0.987008 -0.619197 -0.008506  1.256213  1.628736  0.408167  0.832758
NA07000  1.090804  1.424152  1.197952 -0.760998 -0.143894 -0.670251  1.358280
NA07037 -0.694491 -0.627818 -0.072522 -0.429296  0.051427 -0.344107  0.487541
NA07048 -0.611454 -1.084770  0.016116 -0.475580  0.648445  2.105544  0.366606
NA07051  1.464471 -0.550223  0.592261 -0.956419 -0.493998 -0.113187 -1.876205
NA07056 -0.466140  1.191366  1.724853 -0.908849  0.635547 -1.079716 -1.106613
NA07347 -1.940047  0.743036 -1.727092  1.473490 -0.543749  0.846200  0.619471
NA07357 -0.030195  1.441396 -0.151311 -0.737856 -1.138926 -0.570265  0.804173

这将对整个df进行操作,因此无需迭代行/列