Question

我正在查询我的数据库以显示过去一周的记录。然后我将聚合数据并将其在python和pandas中转换为DataFrame。在此表中，我试图显示过去7周内每天发生的事情，但是，有些日子没有事件发生。在这些情况下，日期完全丢失。我正在寻找一种方法来附加不存在的日期（但是是查询中指定的日期范围的一部分），这样我就可以填充任何我想要的其他缺失列的值。

在某些试验中，我将数据设置为pandas Dataframe，其中日期是索引，而在其他试验中，日期是列。我最好希望将日期作为最高指数 - 因此按名称分组，堆栈购买和发送日期以及日期是＆＃39;列＆＃39;。

以下是数据框现在如何以及我正在寻找的示例：

在查询中设置的日期 - 01.08.2016 - 08.08.2016。数据框看起来如此：

       |  dates       | name     | purchase | send_back
   0  01.08.2016    Michael     120          0
   1  02.08.2016    Sarah       100         40
   2  04.08.2016    Sarah       55           0
   3  05.08.2016    Michael     80          20
   4  07.08.2016    Sarah       130          0

后：

     | dates       | name     | purchase | send_back
   0 01.08.2016    Michael      120          0
   1 02.08.2016    Sarah        100          40
   2 03.08.2016    -            0            0
   3 04.08.2016    Sarah        55           0
   4 05.08.2016    Michael      80           20
   5 06.08.2016    -            0            0
   6 07.08.2016    Sarah        130          0
   7 08.08.2016    Sarah        0            35
   8 08.08.2016    Michael      20           0

打印以下内容：

 df.index

给出：

 'Index([   u'dates',u'name',u'purchase',u'send_back'],
      dtype='object')

RangeIndex(start=0, stop=1, step=1)'

我感谢任何指导。

Answer 1

假设您有以下DF：

In [93]: df
Out[93]:
               name  purchase  send_back
dates
2016-08-01  Michael       120          0
2016-08-02    Sarah       100         40
2016-08-04    Sarah        55          0
2016-08-05  Michael        80         20
2016-08-07    Sarah       130          0

您可以重新取样并替换：

In [94]: df.resample('D').first().replace({'name':{np.nan:'-'}}).fillna(0)
Out[94]:
               name  purchase  send_back
dates
2016-08-01  Michael     120.0        0.0
2016-08-02    Sarah     100.0       40.0
2016-08-03        -       0.0        0.0
2016-08-04    Sarah      55.0        0.0
2016-08-05  Michael      80.0       20.0
2016-08-06        -       0.0        0.0
2016-08-07    Sarah     130.0        0.0

Answer 2

您的索引属于object类型，您必须将其转换为datetime格式。

# Converting the object date to datetime.date
df['dates'] = df['dates'].apply(lambda x: datetime.strptime(x, "%d.%m.%Y"))

# Setting the index column
df.set_index(['dates'], inplace=True)

# Choosing a date range extending from first date to the last date with a set frequency
new_index = pd.date_range(start=df.index[0], end=df.index[-1], freq='D')
new_index.name = df.index.name

# Setting the new index
df = df.reindex(new_index)

# Making the required modifications
df.ix[:,0], df.ix[:,1:] = df.ix[:,0].fillna('-'), df.ix[:,1:].fillna(0)

print (df)

               name  purchase  send_back
dates                                   
2016-08-01  Michael     120.0        0.0
2016-08-02    Sarah     100.0       40.0
2016-08-03        -       0.0        0.0
2016-08-04    Sarah      55.0        0.0
2016-08-05  Michael      80.0       20.0
2016-08-06        -       0.0        0.0
2016-08-07    Sarah     130.0        0.0

假设您有一天的数据（，如评论部分中所述），并且您希望使用空值填充一周中的其他日期：

数据设置：

df = pd.DataFrame({'dates':['01.08.2016'], 'name':['Michael'], 
                   'purchase':[120], 'send_back':[0]})
print (df)

        dates     name  purchase  send_back
0  01.08.2016  Michael       120          0

<强>运营：

df['dates'] = df['dates'].apply(lambda x: datetime.strptime(x, "%d.%m.%Y"))
df.set_index(['dates'], inplace=True)

# Setting periods as 7 to account for the end of the week
new_index = pd.date_range(start=df.index[0], periods=7, freq='D')
new_index.name = df.index.name

# Setting the new index
df = df.reindex(new_index)
print (df)

               name  purchase  send_back
dates                                   
2016-08-01  Michael     120.0        0.0
2016-08-02      NaN       NaN        NaN
2016-08-03      NaN       NaN        NaN
2016-08-04      NaN       NaN        NaN
2016-08-05      NaN       NaN        NaN
2016-08-06      NaN       NaN        NaN
2016-08-07      NaN       NaN        NaN

如果你想用0填充空值，你可以这样做：

df.fillna(0, inplace=True)
print (df)
               name  purchase  send_back
dates                                   
2016-08-01  Michael     120.0        0.0
2016-08-02        0       0.0        0.0
2016-08-03        0       0.0        0.0
2016-08-04        0       0.0        0.0
2016-08-05        0       0.0        0.0
2016-08-06        0       0.0        0.0
2016-08-07        0       0.0        0.0

日期填写日期范围和fillna

2 个答案: