Pandas element-wise min max对一个轴上的一系列

时间:2017-05-16 22:10:08

标签: pandas dataframe max min elementwise-operations

我有一个Dataframe:

df = 
             A    B    C    D
DATA_DATE
20170103   5.0  3.0  NaN  NaN
20170104   NaN  NaN  NaN  1.0
20170105   1.0  NaN  2.0  3.0

我有一个系列

s = 
DATA_DATE
20170103    4.0
20170104    0.0
20170105    2.2

我想运行一个元素明确的max()函数,并在s的列上对齐df。换句话说,我想得到

result = 
             A    B    C    D
DATA_DATE
20170103   5.0  4.0  NaN  NaN
20170104   NaN  NaN  NaN  1.0
20170105   2.2  NaN  2.2  3.0

最好的方法是什么?我已经检查了single column comparisonseries to series comparison,但没有找到针对系列运行数据帧的有效方法。

奖励:不确定答案是否从上面不言而喻,但如果我想在s上对齐df,该怎么做(假设尺寸匹配)?

3 个答案:

答案 0 :(得分:8)

数据:

In [135]: df
Out[135]:
             A    B    C    D
DATA_DATE
20170103   5.0  3.0  NaN  NaN
20170104   NaN  NaN  NaN  1.0
20170105   1.0  NaN  2.0  3.0

In [136]: s
Out[136]:
20170103    4.0
20170104    0.0
20170105    2.2
Name: DATA_DATE, dtype: float64

解决方案:

In [66]: df.clip_lower(s, axis=0)
C:\Users\Max\Anaconda4\lib\site-packages\pandas\core\ops.py:1247: RuntimeWarning: invalid value encountered in greater_equal
  result = op(x, y)
Out[66]:
             A    B    C    D
DATA_DATE
20170103   5.0  4.0  NaN  NaN
20170104   NaN  NaN  NaN  1.0
20170105   2.2  NaN  2.2  3.0

我们可以使用以下hack来消除RuntimeWarning

In [134]: df.fillna(np.inf).clip_lower(s, axis=0).replace(np.inf, np.nan)
Out[134]:
             A    B    C    D
DATA_DATE
20170103   5.0  4.0  NaN  NaN
20170104   NaN  NaN  NaN  1.0
20170105   2.2  NaN  2.2  3.0

答案 1 :(得分:7)

这称为广播,可以按如下方式进行:

import numpy as np
np.maximum(df, s[:, None])
Out: 
             A    B    C    D
DATA_DATE                    
20170103   5.0  4.0  NaN  NaN
20170104   NaN  NaN  NaN  1.0
20170105   2.2  NaN  2.2  3.0

此处,s[:, None]会向s添加新轴。 s[:, np.newaxis]可以实现同样的目标。执行此操作时,可以将它们一起广播,因为形状(3, 4)(3, 1)具有共同元素。

请注意ss[:, None]

之间的区别
s.values
Out: array([ 4. ,  0. ,  2.2])

s[:, None]
Out: 
array([[ 4. ],
       [ 0. ],
       [ 2.2]])

s.shape
Out: (3,)

s[:, None].shape
Out: (3, 1)

另一种选择是:

df.mask(df.le(s, axis=0), s, axis=0)

Out: 
             A    B    C    D
DATA_DATE                    
20170103   5.0  4.0  NaN  NaN
20170104   NaN  NaN  NaN  1.0
20170105   2.2  NaN  2.2  3.0

这是:比较df和s。如果df较大,请使用df,否则使用s。

答案 2 :(得分:0)

虽然可能有更好的解决方案来解决您的问题,但我相信这应该能满足您的需求:

for c in df.columns:
    df[c] = pd.concat([df[c], s], axis=1).max(axis=1)