我有一个DataFrame
,其中有许多列,还有一个Series
。两者具有相同的DateTimeIndex
。
DataFrame
:
>>> print(df)
AAPL GOOG MSFT AMZN FB
2018-01-01 NaN NaN NaN NaN NaN
2018-01-02 -0.667375 -1.567656 -1.161474 -0.674142 -1.886490
2018-01-03 2.004473 -2.802214 -24.084166 -2.447172 2.346972
2018-01-04 -4.261619 -1.471697 -0.027939 -1.753661 -1.835053
2018-01-05 -1.008718 -2.816736 -1.524315 -1.001672 0.080345
2018-01-06 -30.325012 -1.056776 -1.190017 2319.212083 -1.847443
2018-01-07 0.497589 8.588272 -2.434537 -0.793424 -1.194649
2018-01-08 -1.650655 -0.583868 -10.141386 2.704900 7.449458
2018-01-09 1.821119 -6.742207 -0.710584 -0.003800 -1.535461
2018-01-10 -0.624853 0.030330 0.405643 -0.513841 -0.775323
Series
:
>>> print(ser)
2018-01-01 NaN
2018-01-02 -1.191427
2018-01-03 -4.996421
2018-01-04 -1.869994
2018-01-05 -1.254219
2018-01-06 456.958567
2018-01-07 0.932650
2018-01-08 -0.444310
2018-01-09 -1.434187
2018-01-10 -0.295609
如果尝试从数据框中减去序列,则会得到以下结果:
>>> df - ser
2018-01-01 00:00:00 2018-01-02 00:00:00 2018-01-03 00:00:00 \
2018-01-01 NaN NaN NaN
2018-01-02 NaN NaN NaN
2018-01-03 NaN NaN NaN
2018-01-04 NaN NaN NaN
2018-01-05 NaN NaN NaN
2018-01-06 NaN NaN NaN
2018-01-07 NaN NaN NaN
2018-01-08 NaN NaN NaN
2018-01-09 NaN NaN NaN
2018-01-10 NaN NaN NaN
2018-01-04 00:00:00 2018-01-05 00:00:00 2018-01-06 00:00:00 \
2018-01-01 NaN NaN NaN
2018-01-02 NaN NaN NaN
2018-01-03 NaN NaN NaN
2018-01-04 NaN NaN NaN
2018-01-05 NaN NaN NaN
2018-01-06 NaN NaN NaN
2018-01-07 NaN NaN NaN
2018-01-08 NaN NaN NaN
2018-01-09 NaN NaN NaN
2018-01-10 NaN NaN NaN
2018-01-07 00:00:00 2018-01-08 00:00:00 2018-01-09 00:00:00 \
2018-01-01 NaN NaN NaN
2018-01-02 NaN NaN NaN
2018-01-03 NaN NaN NaN
2018-01-04 NaN NaN NaN
2018-01-05 NaN NaN NaN
2018-01-06 NaN NaN NaN
2018-01-07 NaN NaN NaN
2018-01-08 NaN NaN NaN
2018-01-09 NaN NaN NaN
2018-01-10 NaN NaN NaN
2018-01-10 00:00:00 AAPL GOOG MSFT AMZN FB
2018-01-01 NaN NaN NaN NaN NaN NaN
2018-01-02 NaN NaN NaN NaN NaN NaN
2018-01-03 NaN NaN NaN NaN NaN NaN
2018-01-04 NaN NaN NaN NaN NaN NaN
2018-01-05 NaN NaN NaN NaN NaN NaN
2018-01-06 NaN NaN NaN NaN NaN NaN
2018-01-07 NaN NaN NaN NaN NaN NaN
2018-01-08 NaN NaN NaN NaN NaN NaN
2018-01-09 NaN NaN NaN NaN NaN NaN
2018-01-10 NaN NaN NaN NaN NaN NaN
我还收到以下警告:
RuntimeWarning: Cannot compare type 'Timestamp' with type 'str', sort order is undefined for incomparable objects return this.join(other, how=how, return_indexers=return_indexers)
我知道我可以使用DataFrame.sub
>>> res = df.sub(ser, axis=0)
>>> print(res)
AAPL GOOG MSFT AMZN FB
2018-01-01 NaN NaN NaN NaN NaN
2018-01-02 0.524052 -0.376229 0.029954 0.517286 -0.695062
2018-01-03 7.000894 2.194208 -19.087745 2.549249 7.343393
2018-01-04 -2.391625 0.398297 1.842054 0.116333 0.034941
2018-01-05 0.245501 -1.562517 -0.270096 0.252547 1.334565
2018-01-06 -487.283579 -458.015343 -458.148584 1862.253516 -458.806010
2018-01-07 -0.435061 7.655622 -3.367187 -1.726074 -2.127300
2018-01-08 -1.206344 -0.139558 -9.697076 3.149210 7.893768
2018-01-09 3.255306 -5.308020 0.723603 1.430386 -0.101274
2018-01-10 -0.329244 0.325939 0.701251 -0.218232 -0.479714
但是,我不知道的是:
使用Dataframe.__sub__
执行什么操作?
另外,对我来说,从Series
中减去DataFrame
类型/内容匹配的Index
中减去class Robot():
"""
A simple robot class
This multi-line comment is a good place
to provide a description of what the class
is.
"""
# define the initiating function.
# speed = value between 0 and 255
# duration = value in milliseconds
def __init__(self, name, desc, color, owner,
speed = 125, duration = 100):
# initiates our robot
self.name = name
self.desc = desc
self.color = color
self.owner = owner
self.speed = speed
self.duration = duration
def drive_forward(self):
# simulates driving forward
print(self.name.title() + " is driving" +
" forward " + str(self.duration) +
" milliseconds")
def drive_backward(self):
# simulates driving backward
print(self.name.title() + " is driving" +
" backward " + str(self.duration) +
" milliseconds")
def turn_left(self):
# simulates turning left
print(self.name.title() + " is turning " +
" right " + str(self.duration) +
" milliseconds")
def turn_right(self):
# simulates turning right
print(self.name.title() + " is turning " +
" left " + str(self.duration) +
" milliseconds")
def set_speed(self, speed):
# sets the speed of the motors
self.speed = speed
print("the motor speed is now " +
str(self.speed))
def set_duration(self, duration):
# sets duration of travel
self. duration = duration
print("the duration is now " +
str(self.duration))'
似乎非常违反直觉,不是进行逐元素减法。不这样做的原因是什么?
答案 0 :(得分:1)
阅读与您链接得很好的文档,我们会发现:
轴:{0,1,'索引','列'}
对于“系列”输入,轴与“系列”索引相匹配
,默认值为'columns'
。这给了我们提示,当您进行减法运算时会发生什么,即:
df.sub(s) # by not specifying axis you are passing axis=1 / 'columns'
或
df - s
任何一个都无法按照您的要求工作。现在让我们回顾一下黄色方框。另一种选择是'index'
,我们将匹配索引(听起来完全符合您的期望),即:
df.sub(s, axis=0) # or 'index' <-- note that you pass a param here
或
(df.T - s).T #swap columns and rows and swap back again (transpose)
为什么?。这是一个设计问题。设计者可能还已经将'index'
设置为默认值,但是由于未知的原因(可能是因为它使用得更频繁,并且熊猫后面的底层软件包numpy如此操作)他们选择了columns
。通过执行以下操作可以轻松测试numpy行为:df.values - s.values
实际上可以在您对行(索引)感兴趣的列上运行。
简短原因: numpy的工作原理如下。
对此,最优雅的解决方案是使用.sub()
并指定axis='index'
。 (或0,但在这种情况下索引可能更易读)