熊猫:查找指定星期几中与其他日期接近的日期

时间:2020-02-28 09:42:23

标签: pandas

我正在寻求将from ebaysdk.finding import Connection as Finding yaml_path='C:/Users/Public/ebay/eBay_API/ebay_1.yaml' #path: yaml file location api = Finding(config_file=yaml_path, siteid='EBAY-US') request = { 'keywords': 'buffalo nickel coin', 'itemFilter': [ {'name': 'condition', 'value': 'new'} ], 'paginationInput': { 'entriesPerPage': 10, 'pageNumber': 1 }, 'sortOrder': 'PricePlusShippingLowest' } response = api.execute('findItemsByKeywords', request) print(response.reply.paginationOutput) 方法替换为更快的方法,以解决以下问题:

具有applyday_of_week列,我需要找到closest_date,它们是指定的found_dates且最接近{{1} },结果等于day_of_week

初始closest_date

closest_date

我需要加快以下工作代码的速度:

df
下面的

只是为了解决 closest_date day_of_week 0 2009-06-01 6 1 2014-09-02 0 2 2014-10-11 4 3 2015-01-02 3 4 2015-07-11 4 应该等于from pandas.tseries.offsets import Week def find_nearset_day_to_dayofweek(row): return row['closest_date'] - Week(weekday=row['day_of_week']) df['date'] = df.apply(find_nearset_day_to_dayofweek, axis=1) 的地方,但是要在一周前返回。 将numpy导入为np

found_date

返回以下closest_date

df['closest_date_dayofweek'] = df['closest_date'].dt.dayofweek

df['found_date'] = np.where(df['closest_date_dayofweek']==df['day_of_week'], 
                                              df['closest_date'],
                                              df['found_date'])
df = df.drop(['closest_date_dayofweek'], axis=1)

以上代码的问题是df方法,该方法很慢。关于如何加快速度的任何想法?

谢谢!

1 个答案:

答案 0 :(得分:3)

由于只能使用7值,因此可以使用循环,仅用另一列过滤匹配的行:

for i in range(7):
    m = df['day_of_week'].eq(i)
    df.loc[m, 'date'] = df.loc[m, 'closest_date'] - Week(weekday=i)

然后不需要新列,请使用:

df['date'] = np.where(df['closest_date'].dt.dayofweek==df['day_of_week'],
                      df['closest_date'], df['date'])

5000行的性能:

from pandas.tseries.offsets import Week

def find_nearset_day_to_dayofweek(row):
    return row.closest_date - Week(weekday=row['day_of_week'])

df = pd.concat([df] * 1000, ignore_index=True)

In [137]: %timeit df['date'] = df.apply(find_nearset_day_to_dayofweek, axis=1)
550 ms ± 77 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [138]: %%timeit
     ...: for i in range(7):
     ...:     m = df['day_of_week'].eq(i)
     ...:     df.loc[m, 'date1'] = df.loc[m, 'closest_date'] - Week(weekday=i)
     ...:     
38.1 ms ± 883 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)