这里正在执行什么减法运算?

时间:2018-06-30 20:55:45

标签: python pandas dataframe

我有一个DataFrame,其中有许多列,还有一个Series。两者具有相同的DateTimeIndex

DataFrame

>>> print(df)

                 AAPL      GOOG       MSFT         AMZN        FB
2018-01-01        NaN       NaN        NaN          NaN       NaN
2018-01-02  -0.667375 -1.567656  -1.161474    -0.674142 -1.886490
2018-01-03   2.004473 -2.802214 -24.084166    -2.447172  2.346972
2018-01-04  -4.261619 -1.471697  -0.027939    -1.753661 -1.835053
2018-01-05  -1.008718 -2.816736  -1.524315    -1.001672  0.080345
2018-01-06 -30.325012 -1.056776  -1.190017  2319.212083 -1.847443
2018-01-07   0.497589  8.588272  -2.434537    -0.793424 -1.194649
2018-01-08  -1.650655 -0.583868 -10.141386     2.704900  7.449458
2018-01-09   1.821119 -6.742207  -0.710584    -0.003800 -1.535461
2018-01-10  -0.624853  0.030330   0.405643    -0.513841 -0.775323

Series

>>> print(ser)

2018-01-01           NaN
2018-01-02     -1.191427
2018-01-03     -4.996421
2018-01-04     -1.869994
2018-01-05     -1.254219
2018-01-06    456.958567
2018-01-07      0.932650
2018-01-08     -0.444310
2018-01-09     -1.434187
2018-01-10     -0.295609

如果尝试从数据框中减去序列,则会得到以下结果:

>>> df - ser

            2018-01-01 00:00:00  2018-01-02 00:00:00  2018-01-03 00:00:00  \
2018-01-01                  NaN                  NaN                  NaN   
2018-01-02                  NaN                  NaN                  NaN   
2018-01-03                  NaN                  NaN                  NaN   
2018-01-04                  NaN                  NaN                  NaN   
2018-01-05                  NaN                  NaN                  NaN   
2018-01-06                  NaN                  NaN                  NaN   
2018-01-07                  NaN                  NaN                  NaN   
2018-01-08                  NaN                  NaN                  NaN   
2018-01-09                  NaN                  NaN                  NaN   
2018-01-10                  NaN                  NaN                  NaN   

            2018-01-04 00:00:00  2018-01-05 00:00:00  2018-01-06 00:00:00  \
2018-01-01                  NaN                  NaN                  NaN   
2018-01-02                  NaN                  NaN                  NaN   
2018-01-03                  NaN                  NaN                  NaN   
2018-01-04                  NaN                  NaN                  NaN   
2018-01-05                  NaN                  NaN                  NaN   
2018-01-06                  NaN                  NaN                  NaN   
2018-01-07                  NaN                  NaN                  NaN   
2018-01-08                  NaN                  NaN                  NaN   
2018-01-09                  NaN                  NaN                  NaN   
2018-01-10                  NaN                  NaN                  NaN   

            2018-01-07 00:00:00  2018-01-08 00:00:00  2018-01-09 00:00:00  \
2018-01-01                  NaN                  NaN                  NaN   
2018-01-02                  NaN                  NaN                  NaN   
2018-01-03                  NaN                  NaN                  NaN   
2018-01-04                  NaN                  NaN                  NaN   
2018-01-05                  NaN                  NaN                  NaN   
2018-01-06                  NaN                  NaN                  NaN   
2018-01-07                  NaN                  NaN                  NaN   
2018-01-08                  NaN                  NaN                  NaN   
2018-01-09                  NaN                  NaN                  NaN   
2018-01-10                  NaN                  NaN                  NaN   

            2018-01-10 00:00:00  AAPL  GOOG  MSFT  AMZN  FB  
2018-01-01                  NaN   NaN   NaN   NaN   NaN NaN  
2018-01-02                  NaN   NaN   NaN   NaN   NaN NaN  
2018-01-03                  NaN   NaN   NaN   NaN   NaN NaN  
2018-01-04                  NaN   NaN   NaN   NaN   NaN NaN  
2018-01-05                  NaN   NaN   NaN   NaN   NaN NaN  
2018-01-06                  NaN   NaN   NaN   NaN   NaN NaN  
2018-01-07                  NaN   NaN   NaN   NaN   NaN NaN  
2018-01-08                  NaN   NaN   NaN   NaN   NaN NaN  
2018-01-09                  NaN   NaN   NaN   NaN   NaN NaN  
2018-01-10                  NaN   NaN   NaN   NaN   NaN NaN  

我还收到以下警告:

RuntimeWarning: Cannot compare type 'Timestamp' with type 'str', sort order is
undefined for incomparable objects
  return this.join(other, how=how, return_indexers=return_indexers)

我知道我可以使用DataFrame.sub

进行逐元素减法
>>> res = df.sub(ser, axis=0)
>>> print(res)

                  AAPL        GOOG        MSFT         AMZN          FB
2018-01-01         NaN         NaN         NaN          NaN         NaN
2018-01-02    0.524052   -0.376229    0.029954     0.517286   -0.695062
2018-01-03    7.000894    2.194208  -19.087745     2.549249    7.343393
2018-01-04   -2.391625    0.398297    1.842054     0.116333    0.034941
2018-01-05    0.245501   -1.562517   -0.270096     0.252547    1.334565
2018-01-06 -487.283579 -458.015343 -458.148584  1862.253516 -458.806010
2018-01-07   -0.435061    7.655622   -3.367187    -1.726074   -2.127300
2018-01-08   -1.206344   -0.139558   -9.697076     3.149210    7.893768
2018-01-09    3.255306   -5.308020    0.723603     1.430386   -0.101274
2018-01-10   -0.329244    0.325939    0.701251    -0.218232   -0.479714

但是,我不知道的是:

  • 使用Dataframe.__sub__执行什么操作

  • 另外,对我来说,从Series中减去DataFrame类型/内容匹配的Index中减去class Robot(): """ A simple robot class This multi-line comment is a good place to provide a description of what the class is. """ # define the initiating function. # speed = value between 0 and 255 # duration = value in milliseconds def __init__(self, name, desc, color, owner, speed = 125, duration = 100): # initiates our robot self.name = name self.desc = desc self.color = color self.owner = owner self.speed = speed self.duration = duration def drive_forward(self): # simulates driving forward print(self.name.title() + " is driving" + " forward " + str(self.duration) + " milliseconds") def drive_backward(self): # simulates driving backward print(self.name.title() + " is driving" + " backward " + str(self.duration) + " milliseconds") def turn_left(self): # simulates turning left print(self.name.title() + " is turning " + " right " + str(self.duration) + " milliseconds") def turn_right(self): # simulates turning right print(self.name.title() + " is turning " + " left " + str(self.duration) + " milliseconds") def set_speed(self, speed): # sets the speed of the motors self.speed = speed print("the motor speed is now " + str(self.speed)) def set_duration(self, duration): # sets duration of travel self. duration = duration print("the duration is now " + str(self.duration))' 似乎非常违反直觉,不是进行逐元素减法。不这样做的原因是什么?

1 个答案:

答案 0 :(得分:1)

阅读与您链接得很好的文档,我们会发现:

  

轴:{0,1,'索引','列'}

     

对于“系列”输入,轴与“系列”索引相匹配

,默认值为'columns'。这给了我们提示,当您进行减法运算时会发生什么,即:

df.sub(s) # by not specifying axis you are passing axis=1 / 'columns'

df - s

任何一个都无法按照您的要求工作。现在让我们回顾一下黄色方框。另一种选择是'index',我们将匹配索引(听起来完全符合您的期望),即:

df.sub(s, axis=0) # or 'index' <-- note that you pass a param here

(df.T - s).T  #swap columns and rows and swap back again (transpose)

为什么?。这是一个设计问题。设计者可能还已经将'index'设置为默认值,但是由于未知的原因(可能是因为它使用得更频繁,并且熊猫后面的底层软件包numpy如此操作)他们选择了columns。通过执行以下操作可以轻松测试numpy行为:df.values - s.values实际上可以在您对行(索引)感兴趣的列上运行。

简短原因: numpy的工作原理如下。


对此,最优雅的解决方案是使用.sub()并指定axis='index'。 (或0,但在这种情况下索引可能更易读)