实现`df [m] = df [x] + df [y] + df [z]`的更好方法

时间:2016-09-30 15:22:21

标签: python pandas apply nan

我想获得三列的总和,我采用的方法如下:

In [14]:

a_pd = pd.DataFrame({'a': np.arange(3),
                     'b': [5, 7, np.NAN],
                     'c': [2, 9, 0]})
a_pd
Out[14]:
a   b   c
0   0   5.0 2
1   1   7.0 9
2   2   NaN 0
In [18]:

b_pd = a_pd['a'] + a_pd['b'] + a_pd['c']
b_pd
Out[18]:
0     7.0
1    17.0
2     NaN
dtype: float64

但正如你所看到的,NaN不能被排除在外。 所以我尝试了np.add(),但出了点问题:

In [19]:

b_pd = a_pd[['a', 'b', 'c']].apply(np.add, axis=1)
b_pd
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-19-f52f400573b4> in <module>()
----> 1 b_pd = a_pd[['a', 'b', 'c']].apply(np.add, axis=1)
      2 b_pd

F:\anaconda\lib\site-packages\pandas\core\frame.pyc in apply(self, func, axis, broadcast, raw, reduce, args, **kwds)
   4045 
   4046         if isinstance(f, np.ufunc):
-> 4047             results = f(self.values)
   4048             return self._constructor(data=results, index=self.index,
   4049                                      columns=self.columns, copy=False)

ValueError: invalid number of arguments

所以,我想知道你是如何修复这个错误的。

2 个答案:

答案 0 :(得分:5)

您可以使用DataFrame的sum方法:

a_pd.sum(axis=1)
Out: 
0     7.0
1    17.0
2     2.0
dtype: float64

如果要指定列:

a_pd[['a', 'b', 'c']].sum(axis=1)
Out: 
0     7.0
1    17.0
2     2.0
dtype: float64

答案 1 :(得分:2)

np.add需要输入

Traceback (most recent call last):
  File "heap.py", line 24, in <module>
    h.builheap([16,14,10,8,7,9,3,2,4,1])
  File "heap.py", line 9, in builheap
    self.maxheap(self.heap,i)
  File "heap.py", line 11, in maxheap
    if list[2*index]<=self.heapsize and list[2*index]>list[index]:
IndexError: list index out of range