有效地遍历pandas数据帧索引

时间:2015-07-20 15:28:42

标签: python-2.7 pandas

import pandas as pd
from numpy.random import randn

oldn = pd.DataFrame(randn(10, 4), columns=['A', 'B', 'C', 'D'])

我想创建一个长度为0..9行的新DataFrame,并且有一列" avg",其行N =平均值(旧[N] [' A& #39;],旧[N] [' B'] ..旧[N] [' D'])

我对大熊猫不是很熟悉,所以我所有的想法如何做到这一点都是粗暴的循环和事情。创建和填充新表的有效方法是什么?

2 个答案:

答案 0 :(得分:1)

在您的df上调用mean并传递参数axis=1以计算行的平均值,然后您可以将其作为数据传递给DataFrame ctor:

In [128]:

new_df = pd.DataFrame(data = oldn.mean(axis=1), columns=['avg'])
new_df
Out[128]:
        avg
0  0.541550
1  0.525518
2 -0.492634
3  0.163784
4  0.012363
5  0.514676
6 -0.468888
7  0.334473
8  0.669139
9  0.736748

答案 1 :(得分:0)

如果您想要特定列的平均值,请使用以下内容。否则,您可以使用@EdChum提供的答案

old['Avg'] = oldn.apply(lambda v: ((v[['A','B','C','D']]).sum() / 4.), axis=1)
print oldn
         A         B         C         D       Avg
0 -0.201468 -0.832845  0.100299  0.044853 -0.222290
1  1.510688 -0.955329  0.239836  0.767431  0.390657
2  0.780910  0.335267  0.423232 -0.678401  0.215252
3  0.780518  2.876386 -0.797032 -0.523407  0.584116
4  0.438313 -1.952162  0.909568 -0.465147 -0.267357
5  0.145152 -0.836300  0.352706 -0.794815 -0.283314
6 -0.375432 -1.354249  0.920052 -1.002142 -0.452943
7  0.663149 -0.064227  0.321164  0.779981  0.425017
8 -1.279022 -2.206743  0.534943  0.794929 -0.538973
9 -0.339976  0.636516 -0.530445 -0.832413 -0.266579

# Use the file name mbox-short.txt as the file name
fname = raw_input("Enter file name: ")
fh = open(fname)
count = 0
total = 0
for line in fh:
    if not line.startswith("X-DSPAM-Confidence:") :     continue
    count = count + 1
   # print count
    num = float(line[20:])
    total +=num
   # print total
    average = total/count
print "Average spam confidence:", average