下面你会找到我编写的代码来计算df.a和df.b值的相对变化,而df是一个数据帧。必须计算的内容基本上是df["c"] = df.a/df.a.iloc[df.d].values
。如果df.a/df.a.iloc[df.d].values
大于或小于df.b/df.b.iloc[df.d].values * (1+ tolerance)
问题是代码目前带有以下错误代码:ValueError: ('The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().', u'occurred at index 2011-01-01 00:00:00')
我绝对不知道为什么......
import pandas as pd
import numpy as np
import datetime
randn = np.random.randn
rng = pd.date_range('1/1/2011', periods=10, freq='D')
df = pd.DataFrame({'a': [1.1, 1.2, 2.3, 1.4, 1.5, 1.8, 0.7, 1.8, 1.9, 2.0], 'b': [1.1, 1.5, 1.3, 1.6, 1.5, 1.1, 1.5, 1.7, 2.1, 2.1],'c':[None] * 10},index=rng)
df["d"]= [0,0,0,0,0,0,0,0,0,0]
df["t"]= np.arange(len(df))
tolerance = 0.3
def set_t(x):
if df.a/df.a.iloc[df.d].values < df.b/df.b.iloc[df.d].values * (1+tolerance):
return df.iloc[df.index.get_loc(x.name) - 1]['d'] == df.t
elif df.a/df.a.iloc[df.d].values > df.b/df.b.iloc[df.d].values * (1+tolerance):
return df.iloc[df.index.get_loc(x.name) - 1]['d'] == df.t
#The conditions in part one are exactly the same as in part 2, only first it says smaller than, and in the second part is bigger than df.b/df.b.iloc[df.d].values * (1+tolerance)
df['d'] = df.apply(set_t, axis =1)
#df["d"]= [0,0,0,3,3,3,6,7,7,7] this should be the coutcome for d
df["c"] = df.a/df.a.iloc[df.d].values
(df.a/df.a.iloc[df.d].values).all() < (df.b/df.b.iloc[df.d].values).all()
或.any()
的应用程序不会导致所需的结果,因为它只会检查当前设置的数据何时为TRUE或FALSE,但它不会设置新值。
期望的结果如下:
a b c d t
2011-01-01 1.1 1.1 1.000000 0 0
2011-01-02 1.2 1.5 1.090909 0 1
2011-01-03 2.3 1.3 2.090909 0 2
2011-01-04 1.4 1.6 1.000000 3 3
2011-01-05 1.5 1.5 1.071429 3 4
2011-01-06 1.8 1.1 1.285714 3 5
2011-01-07 0.7 1.5 1.000000 6 6
2011-01-08 1.8 1.7 1.000000 7 7
2011-01-09 1.9 2.1 1.055556 7 8
2011-01-10 2.0 2.1 1.111111 7 9
任何想法如何解决?
答案 0 :(得分:2)
这不是100%的解决方案,但至少应该让您走上更好的道路并解决主要问题。我从语法方面看到的核心问题是你试图混合矢量化和非矢量化代码。你可以做更像这样的事情:
>>> df['d1'] = df.a/df.a.iloc[df.d].values > df.b/df.b.iloc[df.d].values * (1+tolerance)
>>> df['d2'] = df.a/df.a.iloc[df.d].values * (1+tolerance) < df.b/df.b.iloc[df.d].values
>>> df['d'] = df['d1'] | df['d2']
>>> df
a b c d t d1 d2
2011-01-01 1.1 1.1 None False 0 False False
2011-01-02 1.2 1.5 None False 1 False False
2011-01-03 2.3 1.3 None True 2 True False
2011-01-04 1.4 1.6 None False 3 False False
2011-01-05 1.5 1.5 None False 4 False False
2011-01-06 1.8 1.1 None True 5 True False
2011-01-07 0.7 1.5 None True 6 False True
2011-01-08 1.8 1.7 None False 7 False False
2011-01-09 1.9 2.1 None False 8 False False
2011-01-10 2.0 2.1 None False 9 False False
这不是你想要的答案,但希望能告诉你代码发生了什么,以及如何修复它以获得你想要的东西(即你不需要或不想成为使用函数并在此处应用它,只需使用标准的pandas矢量化代码。)
如果你能做到这一点,更简洁的方法就是使用np.where
(其中两个顺序或嵌套)。
答案 1 :(得分:2)
好的,我得到你想要的结果,但这仍然太复杂和无效。我很想看到一个出色的解决方案:
import pandas as pd
import numpy as np
import datetime
randn = np.random.randn
rng = pd.date_range('1/1/2011', periods=10, freq='D')
df = pd.DataFrame({'a': [1.1, 1.2, 2.3, 1.4, 1.5, 1.8, 0.7, 1.8, 1.9, 2.0], 'b': [1.1, 1.5, 1.3, 1.6, 1.5, 1.1, 1.5, 1.7, 2.1, 2.1],'c':[None] * 10},index=rng)
df["d"]= [0,0,0,0,0,0,0,0,0,0]
df["t"]= np.arange(len(df))
tolerance = 0.3
df['d1'] = df.a/df.a.iloc[df.d].values > df.b/df.b.iloc[df.d].values * (1+tolerance)
df['d2'] = df.a/df.a.iloc[df.d].values * (1+tolerance) < df.b/df.b.iloc[df.d].values
df['e'] = df.d1*df.t
df['f'] = df.d2*df.t
df['g'] = df.e +df.f
df.ix[df.g > df.g.shift(1),"h"] = df.g * 1; df
df.h = df.h + 1
df.h = df.h.shift(1)
df['h'][0] = 0
df.h.fillna(method='ffill',inplace=True)
df["d"] = df.h
df["c"] = df.a/df.a.iloc[df.d].values
这就是结果:
a b c d t d1 d2 e f g h
2011-01-01 1.1 1.1 1.000000 0 0 False False 0 0 0 0
2011-01-02 1.2 1.5 1.090909 0 1 False False 0 0 0 0
2011-01-03 2.3 1.3 2.090909 0 2 True False 2 0 2 0
2011-01-04 1.4 1.6 1.000000 3 3 False False 0 0 0 3
2011-01-05 1.5 1.5 1.071429 3 4 False False 0 0 0 3
2011-01-06 1.8 1.1 1.285714 3 5 True False 5 0 5 3
2011-01-07 0.7 1.5 1.000000 6 6 False True 0 6 6 6
2011-01-08 1.8 1.7 1.000000 7 7 False False 0 0 0 7
2011-01-09 1.9 2.1 1.055556 7 8 False False 0 0 0 7
2011-01-10 2.0 2.1 1.111111 7 9 False False 0 0 0 7
从这里您可以轻松删除行,例如del df['g']