我有一个DataFrame,Df2
。我正在尝试检查下面的列Lead_Lag
的最后10行中的每一行 - 如果在任何这些行中除了null之外还有任何值,那么我希望新列Position
等于{{1 }}:
'Y'
数据样本如下:
def run_HG_AUDUSD_15M_Aggregate():
Df1 = pd.read_csv(max(glob.iglob(r"C:\Users\cost9\OneDrive\Documents\PYTHON\Daily Tasks\Pairs Trading\HG_AUDUSD\CSV\15M\Lead_Lag\*.csv"), key=os.path.getctime))
Df2 = Df1[['Date', 'Close_HG', 'Close_AUDUSD', 'Lead_Lag']]
Df2['Position'] = ''
for index,row in Df2.iterrows():
if Df2.loc[Df2.index.shift(-10):index,"Lead_Lag"].isnull():
continue
else:
Df2.loc[index, 'Position'] = "Y"
因此,在这种情况下,我希望新列Date Close_HG Close_AUDUSD Lead_Lag
7/19/2017 12:59 2.7 0.7956
7/19/2017 13:59 2.7 0.7955
7/19/2017 14:14 2.7 0.7954
7/20/2017 3:14 2.7 0.791
7/20/2017 5:44 2.7 0.791
7/20/2017 7:44 2.71 0.7925
7/20/2017 7:59 2.7 0.7924
7/20/2017 8:44 2.7 0.7953 Short_Both
7/20/2017 10:44 2.71 0.7964 Short_Both
7/20/2017 11:14 2.71 0.7963 Short_Both
7/20/2017 11:29 2.71 0.7967 Short_Both
7/20/2017 13:14 2.71 0.796 Short_Both
7/20/2017 13:29 2.71 0.7956 Short_Both
7/20/2017 14:29 2.71 0.7957 Short_Both
的最后两个值为Position
,因为在'Y'
列中至少有一个值中存在最后10个值行。我想在滚动的基础上应用它 - 例如第13行'位置'值将查看行12-3,行12'位置'值将查看行11-2等。
相反,我得到了错误:
Lead_Lag
我尝试了几种变换方法(在循环之前定义等)并且无法使其工作。
编辑:这是解决方案:
NotImplementedError: Not supported for type RangeIndex
答案 0 :(得分:2)
通过链接使用numpy.where
和布尔掩码:
m = df["Lead_Lag"].notnull() & df.index.isin(df.index[-10:])
或者通过iloc
按位置选择并按reindex
添加False
:
m = df["Lead_Lag"].iloc[-10:].notnull().reindex(df.index, fill_value=False)
df['new'] = np.where(m, 'Y', '')
print (df)
Date Close_HG Close_AUDUSD Lead_Lag new
0 7/19/2017 12:59 2.70 0.7956 NaN
1 7/19/2017 13:59 2.70 0.7955 NaN
2 7/19/2017 14:14 2.70 0.7954 NaN
3 7/20/2017 3:14 2.70 0.7910 NaN
4 7/20/2017 5:44 2.70 0.7910 NaN
5 7/20/2017 7:44 2.71 0.7925 NaN
6 7/20/2017 7:59 2.70 0.7924 NaN
7 7/20/2017 8:44 2.70 0.7953 Short_Both Y
8 7/20/2017 10:44 2.71 0.7964 Short_Both Y
9 7/20/2017 11:14 2.71 0.7963 Short_Both Y
10 7/20/2017 11:29 2.71 0.7967 Short_Both Y
11 7/20/2017 13:14 2.71 0.7960 Short_Both Y
12 7/20/2017 13:29 2.71 0.7956 Short_Both Y
13 7/20/2017 14:29 2.71 0.7957 Short_Both Y
答案 1 :(得分:0)
这就是我最终做的事情:
def run_HG_AUDUSD_15M_Aggregate():
N = 10
Df2['Position'] = ''
for index,row in Df2.iterrows():
if (Df2.loc[index-N:index,"Lead_Lag"] != "N").any():
Df2.loc[index, 'Position'] = "Y"
else:
Df2.loc[index, 'Position'] = "N"
答案 2 :(得分:0)
示例:
np.random.seed(123)
M = 20
Df2 = pd.DataFrame({'Lead_Lag':np.random.choice([np.nan, 'N'], p=[.3,.7], size=M)})
解决方案1-熊猫:
说明:首先比较不等于Series.ne
的布尔值Series
列,然后将Series.rolling
和Series.any
用作窗口中的测试值-最后设置{{1} }和numpy.where
的N
:
Y
另一个带有strides的numpy解决方案,并将前N个值更正为N = 3
a = (Df2['Lead_Lag'].ne('N')
.rolling(N, min_periods=1)
.apply(lambda x: x.any(), raw=False))
Df2['Pos1'] = np.where(a, 'Y','N')
s:
False
比较输出:
def rolling_window(a, window):
shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
strides = a.strides + (a.strides[-1],)
return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)
x = np.concatenate([[False] * (N - 1), Df2['Lead_Lag'].ne('N').values])
arr = np.any(rolling_window(x, N), axis=1)
Df2['Pos2'] = np.where(arr, 'Y','N')
numpy解决方案的详细信息:
为测试的前N -1个值添加print (Df2)
Lead_Lag Pos1 Pos2
0 N N N
1 nan Y Y
2 nan Y Y
3 N Y Y
4 N Y Y
5 N N N
6 N N N
7 N N N
8 N N N
9 N N N
10 N N N
11 N N N
12 N N N
13 nan Y Y
14 N Y Y
15 N Y Y
16 nan Y Y
17 nan Y Y
18 N Y Y
19 N Y Y
值:
False
Stride返回2d布尔数组:
print (np.concatenate([[False] * (N - 1), Df2['Lead_Lag'].ne('N').values]))
[False False False True True False False False False False False False
False False False True False False True True False False]
通过numpy.any
每行测试至少一个True:
print (rolling_window(x, N))
[[False False False]
[False False True]
[False True True]
[ True True False]
[ True False False]
[False False False]
[False False False]
[False False False]
[False False False]
[False False False]
[False False False]
[False False False]
[False False False]
[False False True]
[False True False]
[ True False False]
[False False True]
[False True True]
[ True True False]
[ True False False]]
编辑:
如果使用print (np.any(rolling_window(x, N), axis=1))
[False True True True True False False False False False False False
False True True True True True True True]
解决方案进行测试,则输出将不同。原因是此解决方案测试是在iterrows
窗口中进行的,因此对于相同的输出,必须将N + 1
添加到1
:
N
N = 3
Df2['Position'] = ''
for index,row in Df2.iterrows():
#for check windows
#print (Df2.loc[index-N:index,"Lead_Lag"])
if (Df2.loc[index-N:index,"Lead_Lag"] != "N").any():
Df2.loc[index, 'Position'] = "Y"
else:
Df2.loc[index, 'Position'] = "N"
a = (Df2['Lead_Lag'].ne('N')
.rolling(N + 1, min_periods=1)
.apply(lambda x: x.any(), raw=False) )
Df2['Pos1'] = np.where(a, 'Y','N')
def rolling_window(a, window):
shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
strides = a.strides + (a.strides[-1],)
return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)
x = np.concatenate([[False] * (N), Df2['Lead_Lag'].ne('N').values])
arr = np.any(rolling_window(x, N + 1), axis=1)
Df2['Pos2'] = np.where(arr, 'Y','N')