我有一个包含dateIndex和price列的数据框,如下所示:
DATE | PRICE
01-01-2018 100
02-01-2018 101
03-01-2018 97
我编写了一个函数来计算一行价格与之前3行(“天”)的价格之间的差。 (我知道还有其他的pandas方法可以实现这一点,但是此功能是一个存根,我想稍后再扩展)
def case1(x):
prevrow = x.shift(3)
if np.isnan(prevrow['price']):
pass
else:
if x['price'] > prevrow['price']:
diff = prevrow['price'] - x['price']
print('The diff is {}').format(diff)
但是,当我尝试运行(case1(df)
)时,我遇到了
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
错误。它似乎是由函数开始处的移位生成的3个NaN
值触发的。但是添加对NaN
值的检查仍然会导致相同的错误消息。
有人知道我在做什么错吗?
答案 0 :(得分:1)
为了更好的可视化,我们考虑使用更大的数据框:
DATE | price
01-01-2018 100
02-01-2018 101
03-01-2018 97
04-01-2018 102
05-01-2018 100
06-01-2018 107
07-01-2018 38
您的代码中有一些问题。您正在尝试使用数组而不是单个值进行布尔操作。解决方案:
def case1(x):
# New df with a new column for shift prices
df = x.assign(price_prevrow= x.shift(3)['price'])
if np.isnan(df['price_prevrow']).all(): # Check ALL values
pass
else:
# Slice df to get only rows with price greater than price_prevrow
df = df.loc[df['price'] > df['price_prevrow']]
# Calculate difference
diff = df['price_prevrow'] - df['price']
# Print all differences
for d in diff:
print('The diff is {}'.format(d))
上面的代码创建了一个新的价格变动后的数据框,然后将该价格框与价格值大于预售值的行切片。在此之后,区别很容易。
输出:
"The diff is -2.0"
"The diff is -10.0"