我正在尝试创建一列,以选择每个入口和出口之间的所有日期。 每个条目仅与一个出口配对。
日期是从最近日期到第一天的顺序。
import pandas as pd
import numpy
df = pd.DataFrame({'Entry':[0,0,0,1,0 ,0,0,0,1,0,0,1],
'Exit':[1,0,0,0,0,1,0,1,0,0,1,0]},
index = pd.date_range('1/1/2019',periods = 12))
df1 = df.iloc[::-1]
df1
基表如下:
我想创建一个称为windows的附加列,其外观完全像这样:
答案 0 :(得分:2)
这将解决问题(我知道这可能不是最pythonic的方式,但希望你们都给我点可读性):
# First I sort so I don't have to work backwards
df1.sort_index(inplace=True)
# Generate Window and then iteratively fill it
df1['Window'] = 0
for index, row in df1.iterrows():
if row.Entry == 1:
# Once found, fill all intermediate values as 1 and break.
# Nothing happens if no exit found.
for subindex, subrow in df1.loc[index:].iterrows():
if subrow.Exit == 1:
df1.loc[index:subindex,'Window'] = 1
break
# Sort back to the index order you wanted
df1.sort_index(inplace=True,ascending=False)
答案 1 :(得分:2)
您对窗口功能的逻辑了解不是很清楚。但是似乎您需要的是逐行应用一个函数,该函数可以存储一些内存(最后进入状态或其他状态)。好的方法是定义一个可调用的类,如下所示。请注意,您需要先按升序对df进行排序,然后才能使用它。
class WindowFunc(object):
def __init__(self, initial_status):
self.status = initial_status
def __call__(self, row, enter_col, exit_col):
enter_val = row[enter_col]
exit_val = row[exit_col]
if self.status == 0 and enter_val == 1 and exit_val != 1:
self.status = 1
return 1
elif self.status == 1 and enter_val != 1 and exit_val == 1:
current_status = self.status
self.status = 0
return current_status
else:
return self.status
window_fn = WindowFunc(0)
df['window'] = np.apply_along_axis(window_fn, 1, df, 0, 1)
实例存储状态并将其存储在行之间。您可以更新类中的逻辑以适应您的需求。