Question

我需要为 dfmap 中的所有标签创建一个基于 df 日期范围的指标列。

import pandas as pd

df = pd.DataFrame({
    'date': ['2019-04-19','2019-04-20','2019-04-21', '2019-04-22',
             '2019-10-01','2019-10-02','2019-10-03', '2019-10-04'],
    'tag': ['ID F', 'ID F', 'ID F', 'ID F',
             'ID B', 'ID B', 'ID B', 'ID B'],
    'value': ['1', '2', '3', '4',
             '1', '3', '5', '7']})

df['date'] = pd.to_datetime(df['date'], infer_datetime_format=True)

dfmap = pd.DataFrame({
    'start_date': ['2019-04-20','2019-10-03'],
    'end_date': ['2019-04-21','2019-10-04'],
    'tag': ['ID F', 'ID B']})

print(df)
print(dfmap)

        date   tag value
0 2019-04-19  ID F     1
1 2019-04-20  ID F     2
2 2019-04-21  ID F     3
3 2019-04-22  ID F     4
4 2019-10-01  ID B     1
5 2019-10-02  ID B     3
6 2019-10-03  ID B     5
7 2019-10-04  ID B     7
   start_date    end_date   tag
0  2019-04-20  2019-04-21  ID F
1  2019-10-03  2019-10-04  ID B

所需的数据帧：

print(desired_df)

        date   tag value indicator
0 2019-04-19  ID F     1         0
1 2019-04-20  ID F     2         1
2 2019-04-21  ID F     3         1
3 2019-04-22  ID F     4         0 
4 2019-10-01  ID B     1         0 
5 2019-10-02  ID B     3         0
6 2019-10-03  ID B     5         1
7 2019-10-04  ID B     7         1

Answer 1

只需编写简单的逻辑：

g = lambda x: pd.to_datetime(x)
g = lambda x: pd.to_datetime(x)
df['date'] = g(df['date'])
dfmap[['start_date', 'end_date']].apply(g)

conditions = [((df['tag'].eq(idx)) & (df['date'].between(start, end))) for idx, start, end in zip(dfmap['tag'], dfmap['start_date'], dfmap['end_date'])]
cond = conditions[0] | conditions[1]
df['indicator'] = np.where(cond,1,0)

df：

        date   tag value  indicator
0 2019-04-19  ID F     1          0
1 2019-04-20  ID F     2          1
2 2019-04-21  ID F     3          1
3 2019-04-22  ID F     4          0
4 2019-10-01  ID B     1          0
5 2019-10-02  ID B     3          0
6 2019-10-03  ID B     5          1
7 2019-10-04  ID B     7          1

Answer 2

我不清楚日期和标签如何识别指标的完整逻辑。我确定是

如果范围存在
检查标签是否相同
如果范围内和标签匹配，则指示符为 1
else 指示符为 0

遇到这类问题，我喜欢写函数。

# first setup the dfmap to explicitly be Timestamp as you did with df
dfmap['start_date'] = pd.to_datetime(dfmap['start_date'])
dfmap['end_date'] = pd.to_datetime(dfmap['end_date'])

# write your logic for the range indicators
def get_indicator(row, df):
    dt = row.date
    tag = row.tag
    
    for idx, map_row in df.iterrows():
        if map_row.start_date <= dt <= map_row.end_date:
            if row.tag == map_row.tag:
                return 1
    return 0

# apply
df['indicator'] = df.apply(lambda x: get_indicator(x, dfmap), axis=1)

# print(df)
#         date   tag value  indicator
# 0 2019-04-19  ID F     1          0
# 1 2019-04-20  ID F     2          1
# 2 2019-04-21  ID F     3          1
# 3 2019-04-22  ID F     4          0
# 4 2019-10-01  ID B     1          0
# 5 2019-10-02  ID B     3          0
# 6 2019-10-03  ID B     5          1
# 7 2019-10-04  ID B     7          1

Answer 3

尝试通过 date_range()+agg() 方法和 isin()+astype() 方法：

s=dfmap.agg(lambda x:pd.date_range(x['start_date'],x['end_date']).normalize(),axis=1).explode().unique()
df['indicator']=df['date'].isin(s).astype(int)

注意：您也可以使用 apply() 代替 agg() 方法

或

通过 date_range()+zip()

s=[[*pd.date_range(x,y).normalize()] for x,y in zip(dfmap['start_date'],dfmap['end_date'])]
s=pd.Series(s).explode().unique()
df['indicator']=df['date'].isin(s).view('i1')

df 的输出：

    date        tag     value   indicator
0   2019-04-19  ID F    1       0
1   2019-04-20  ID F    2       1
2   2019-04-21  ID F    3       1
3   2019-04-22  ID F    4       0
4   2019-10-01  ID B    1       0
5   2019-10-02  ID B    3       0
6   2019-10-03  ID B    5       1
7   2019-10-04  ID B    7       1

在另一个 Pandas 数据框中为日期范围创建指标列

3 个答案: