Question

我正在尝试将每周数据上采样到每日数据，但是，我在上采样最后一个边缘时遇到困难。我该怎么办？

import pandas as pd
import datetime
df = pd.DataFrame({'wk start': ['2018-08-12', '2018-08-12', '2018-08-19'], 
    'car': [ 'tesla model 3', 'tesla model x', 'tesla model 3'],
    'sales':[38000,98000, 40000]})
df['wk start'] = df['wk start'].apply(lambda x: datetime.datetime.strptime(x, '%Y-%m-%d'))
df.set_index('wk start').groupby('car').resample('D').pad()

这将返回：

                             car            sales
car             wk start        
tesla model 3   2018-08-12  tesla model 3   38000
                2018-08-13  tesla model 3   38000
                2018-08-14  tesla model 3   38000
                2018-08-15  tesla model 3   38000
                2018-08-16  tesla model 3   38000
                2018-08-17  tesla model 3   38000
                2018-08-18  tesla model 3   38000
                2018-08-19  tesla model 3   40000

tesla model x   2018-08-12  tesla model x   98000

我想要的输出是：

                             car            sales
car             wk start        
tesla model 3   2018-08-12  tesla model 3   38000
                2018-08-13  tesla model 3   38000
                2018-08-14  tesla model 3   38000
                2018-08-15  tesla model 3   38000
                2018-08-16  tesla model 3   38000
                2018-08-17  tesla model 3   38000
                2018-08-18  tesla model 3   38000
                2018-08-19  tesla model 3   40000
                2018-08-20  tesla model 3   40000
                2018-08-21  tesla model 3   40000
                2018-08-22  tesla model 3   40000
                2018-08-23  tesla model 3   40000
                2018-08-24  tesla model 3   40000
                2018-08-25  tesla model 3   40000
tesla model x   2018-08-12  tesla model x   98000
                2018-08-13  tesla model x   98000
                2018-08-14  tesla model x   98000
                2018-08-15  tesla model x   98000
                2018-08-16  tesla model x   98000
                2018-08-17  tesla model x   98000
                2018-08-18  tesla model x   98000

我查看了this，但他们使用的是句点，而我查看的是日期时间。预先感谢！

Answer 1

在您之前的stack尝试之前，请在您的每个星期的结尾和groupby分配一列：

(df.assign(end=df['wk start'].add(pd.DateOffset(6))).set_index(
    ['car', 'sales']).stack()
    .rename('wk start').reset_index([0, 1])
    .set_index('wk start').groupby('car')
    .resample('D').pad()
)

输出：

                                    car  sales
car           wk start
tesla model 3 2018-08-12  tesla model 3  38000
              2018-08-13  tesla model 3  38000
              2018-08-14  tesla model 3  38000
              2018-08-15  tesla model 3  38000
              2018-08-16  tesla model 3  38000
              2018-08-17  tesla model 3  38000
              2018-08-18  tesla model 3  38000
              2018-08-19  tesla model 3  40000
              2018-08-20  tesla model 3  40000
              2018-08-21  tesla model 3  40000
              2018-08-22  tesla model 3  40000
              2018-08-23  tesla model 3  40000
              2018-08-24  tesla model 3  40000
              2018-08-25  tesla model 3  40000
tesla model x 2018-08-12  tesla model x  98000
              2018-08-13  tesla model x  98000
              2018-08-14  tesla model x  98000
              2018-08-15  tesla model x  98000
              2018-08-16  tesla model x  98000
              2018-08-17  tesla model x  98000
              2018-08-18  tesla model x  98000

Answer 2

是的，您是正确的，排除了最后的边缘数据。解决方案是将它们添加到输入DataFrame中-我的解决方案使用drop_duplicates创建一个助手Dataframe，将6天和concat添加到原始{{1} }，然后再使用您的解决方案：

df

Answer 3

您也可以这样做：

~$ ruby -e 'require "date";puts Date.parse("2011-02-23").strftime("%a, %d %b %Y")'
# => Wed, 23 Feb 2011

熊猫重新采样上采样日期/边缘数据

3 个答案: