Question

类似于此帖子 Excel VLOOKUP equivalent in pandas

我不需要它遇到的第一个值，而是第n个值。

这是示例数据集，具有所需的输出。

start

索引示例14：产品123a，日期为2018年1月3日，查看日期为2018年1月3日的产品123a的日期列，并显示匹配的订单，因此，如果匹配项为6，则返回0。

当前，我将日期延迟3天，但是我希望它为“ n”。我可以将原始日期用作索引，但是之后我需要为数据集重新编制索引（很好）。

是否有一种方便的方法，而不是遍历所有行，而是添加一个计数器“ n”，并且在找到“ n”个匹配项时，取该值。由于我的数据集有超过50万行，因此对于一个非常简单的任务而言，这在计算上似乎过于昂贵。

Answer 1

这可能不是最好的解决方案，但它可行：

根据我设置lag_date的方式，我可以将特定日期的数据滞后。

# first create new unique identifiers, based on the data + the product code
df.date = df.date.dt.strftime('%Y-%m-%d') # first convert for concatenating
df['vlook_date'] = df.date + df['product'].astype(str)
df.lag_date = df.lag_date.dt.strftime('%Y-%m-%d')
df['vlook_lagdate'] = df.lag_date + df['product'].astype(str)

# create new data frames to map them together
df1 = df.loc[:, df.columns != 'vlook_date']
df2 = df.loc[:, df.columns != 'vlook_lagdate']

# use map to match the results in df1
df1['lag_orders'] = pd.to_numeric(df1.vlook_lagdate.map(df2.set_index('vlook_date')['orders']).fillna(0),downcast='integer')
df1 = df1.drop(['lag_date','vlook_lagdate'], axis = 1)

如果有清理建议，请告诉我;）

Answer 2

这可能比您的答案慢，因为它会循环，但可能更易于理解：

grp = df.groupby(['date', 'product'])
desired_output = []
n, lag = 0, 3

for i in df.iterrows():
    try:
        desired_output.append(grp.get_group((i[1]['date'] - timedelta(days=lag), i[1]['product'])).iloc[n, 2])
    except KeyError:
        desired_output.append(np.nan)

df['desired_output'] = desired_output

输出：

    date    product orders desired_output lag_date
0   2018-01-01  123a    1   NaN     2017-12-29
1   2018-01-01  123b    2   NaN     2017-12-29
2   2018-01-01  123c    3   NaN     2017-12-29
3   2018-01-02  123a    4   NaN     2017-12-30
4   2018-01-02  123b    5   NaN     2017-12-30
5   2018-01-03  123a    6   NaN     2017-12-31
6   2018-01-03  123b    7   NaN     2017-12-31
7   2018-01-03  123c    8   NaN     2017-12-31
8   2018-01-04  123a    9   1.0     2018-01-01
9   2018-01-04  123b    10  2.0     2018-01-01
10  2018-01-04  123c    11  3.0     2018-01-01
11  2018-01-05  123a    12  4.0     2018-01-02
12  2018-01-05  123b    13  5.0     2018-01-02
13  2018-01-05  123c    14  NaN     2018-01-02
14  2018-01-06  123a    15  6.0     2018-01-03
15  2018-01-06  123c    16  8.0     2018-01-03

熊猫数据框中的第n个vlookup

2 个答案: