我需要为 dfmap
中的所有标签创建一个基于 df
日期范围的指标列。
import pandas as pd
df = pd.DataFrame({
'date': ['2019-04-19','2019-04-20','2019-04-21', '2019-04-22',
'2019-10-01','2019-10-02','2019-10-03', '2019-10-04'],
'tag': ['ID F', 'ID F', 'ID F', 'ID F',
'ID B', 'ID B', 'ID B', 'ID B'],
'value': ['1', '2', '3', '4',
'1', '3', '5', '7']})
df['date'] = pd.to_datetime(df['date'], infer_datetime_format=True)
dfmap = pd.DataFrame({
'start_date': ['2019-04-20','2019-10-03'],
'end_date': ['2019-04-21','2019-10-04'],
'tag': ['ID F', 'ID B']})
print(df)
print(dfmap)
date tag value
0 2019-04-19 ID F 1
1 2019-04-20 ID F 2
2 2019-04-21 ID F 3
3 2019-04-22 ID F 4
4 2019-10-01 ID B 1
5 2019-10-02 ID B 3
6 2019-10-03 ID B 5
7 2019-10-04 ID B 7
start_date end_date tag
0 2019-04-20 2019-04-21 ID F
1 2019-10-03 2019-10-04 ID B
所需的数据帧:
print(desired_df)
date tag value indicator
0 2019-04-19 ID F 1 0
1 2019-04-20 ID F 2 1
2 2019-04-21 ID F 3 1
3 2019-04-22 ID F 4 0
4 2019-10-01 ID B 1 0
5 2019-10-02 ID B 3 0
6 2019-10-03 ID B 5 1
7 2019-10-04 ID B 7 1
答案 0 :(得分:1)
只需编写简单的逻辑:
g = lambda x: pd.to_datetime(x)
g = lambda x: pd.to_datetime(x)
df['date'] = g(df['date'])
dfmap[['start_date', 'end_date']].apply(g)
conditions = [((df['tag'].eq(idx)) & (df['date'].between(start, end))) for idx, start, end in zip(dfmap['tag'], dfmap['start_date'], dfmap['end_date'])]
cond = conditions[0] | conditions[1]
df['indicator'] = np.where(cond,1,0)
df:
date tag value indicator
0 2019-04-19 ID F 1 0
1 2019-04-20 ID F 2 1
2 2019-04-21 ID F 3 1
3 2019-04-22 ID F 4 0
4 2019-10-01 ID B 1 0
5 2019-10-02 ID B 3 0
6 2019-10-03 ID B 5 1
7 2019-10-04 ID B 7 1
答案 1 :(得分:1)
我不清楚日期和标签如何识别指标的完整逻辑。我确定是
遇到这类问题,我喜欢写函数。
# first setup the dfmap to explicitly be Timestamp as you did with df
dfmap['start_date'] = pd.to_datetime(dfmap['start_date'])
dfmap['end_date'] = pd.to_datetime(dfmap['end_date'])
# write your logic for the range indicators
def get_indicator(row, df):
dt = row.date
tag = row.tag
for idx, map_row in df.iterrows():
if map_row.start_date <= dt <= map_row.end_date:
if row.tag == map_row.tag:
return 1
return 0
# apply
df['indicator'] = df.apply(lambda x: get_indicator(x, dfmap), axis=1)
# print(df)
# date tag value indicator
# 0 2019-04-19 ID F 1 0
# 1 2019-04-20 ID F 2 1
# 2 2019-04-21 ID F 3 1
# 3 2019-04-22 ID F 4 0
# 4 2019-10-01 ID B 1 0
# 5 2019-10-02 ID B 3 0
# 6 2019-10-03 ID B 5 1
# 7 2019-10-04 ID B 7 1
答案 2 :(得分:0)
尝试通过 date_range()
+agg()
方法和 isin()
+astype()
方法:
s=dfmap.agg(lambda x:pd.date_range(x['start_date'],x['end_date']).normalize(),axis=1).explode().unique()
df['indicator']=df['date'].isin(s).astype(int)
注意:您也可以使用 apply()
代替 agg()
方法
或
通过 date_range(
)+zip()
s=[[*pd.date_range(x,y).normalize()] for x,y in zip(dfmap['start_date'],dfmap['end_date'])]
s=pd.Series(s).explode().unique()
df['indicator']=df['date'].isin(s).view('i1')
df
的输出:
date tag value indicator
0 2019-04-19 ID F 1 0
1 2019-04-20 ID F 2 1
2 2019-04-21 ID F 3 1
3 2019-04-22 ID F 4 0
4 2019-10-01 ID B 1 0
5 2019-10-02 ID B 3 0
6 2019-10-03 ID B 5 1
7 2019-10-04 ID B 7 1