pivot dataframe reshape并查找前一天计数,但第0天值为0

时间:2018-02-13 03:45:05

标签: python pandas dataframe

我有数据框,如

+---------------+--------------+---------------+-------------------+
|        customer_info         |            counts                 |
+---------------+--------------+---------------+-------------------+
| customer_name | current_date | record_counts | current_day_count |
| Mark          | 2018_02_06   | 23            | 15                |
|               | 2018_02_09   | 65            | 42                |
|               | 2018_02_12   | 7             | 33                |
|               | 2018_02_21   | 36            | 82                |
|               | 2018_02_27   | 43            | 72                |
| Bob           | 2018_02_02   | 56            | 76                |
|               | 2018_02_23   | 77            | 11                |
|               | 2018_03_04   | 35            | 59                |
|               | 2018_03_13   | 34            | 68                |
| Shawn         | 2018_02_11   | 75            | 71                |
|               | 2018_02_15   | 26            | 39                |
|               | 2018_02_18   | 73            | 65                |
|               | 2018_02_24   | 87            | 38                |
+---------------+--------------+---------------+-------------------+

现在我想重新整形上面的数据框,如下所示,我想要一个名为前一天计数的新列,其当前日数为previous day count但是第0天或first day of customer value should be 0以及所有客户名称已填充到位

+---------------+--------------+---------------+-------------------+--------------------+
| customer_name | current_date | record_counts | current_day_count | previous_day_count |
+---------------+--------------+---------------+-------------------+--------------------+
| Mark          | 2018_02_06   |            23 |                15 |                  0 |
| Mark          | 2018_02_09   |            65 |                42 |                 15 |
| Mark          | 2018_02_12   |             7 |                33 |                 42 |
| Mark          | 2018_02_21   |            36 |                82 |                 33 |
| Mark          | 2018_02_27   |            43 |                72 |                 82 |
| Bob           | 2018_02_02   |            56 |                76 |                  0 |
| Bob           | 2018_02_23   |            77 |                11 |                 76 |
| Bob           | 2018_03_04   |            35 |                59 |                 11 |
| Bob           | 2018_03_13   |            34 |                68 |                 59 |
| Shawn         | 2018_02_11   |            75 |                71 |                  0 |
| Shawn         | 2018_02_15   |            26 |                39 |                 71 |
| Shawn         | 2018_02_18   |            73 |                65 |                 39 |
| Shawn         | 2018_02_24   |            87 |                38 |                 65 |
+---------------+--------------+---------------+-------------------+--------------------+

1 个答案:

答案 0 :(得分:2)

您的数据框看起来有点奇怪,但我假设您有所有四列的multiIndex列,并且行索引是默认的。

输入数据帧:

print(df)

      customer_info                        counts                   
      customer_name    current_date record_counts current-day_counts
0    Mark             2018_02_06               23                 15
1    Mark             2018_02_09               65                 42
2    Mark             2018_02_12                7                 33
3    Mark             2018_02_21               36                 82
4    Mark             2018_02_27               43                 72
5    Bob              2018_02_02               56                 76
6    Bob              2018_02_23               77                 11
7    Bob              2018_03_04               35                 59
8    Bob              2018_03_13               34                 68
9    Shawn            2018_02_11               75                 71
10   Shawn            2018_02_15               26                 39
11   Shawn            2018_02_18               73                 65
12   Shawn            2018_02_24               87                 38

df.columns = df.columns.droplevel(0)
df['previous_day_count'] = (df.groupby('customer_name')['current-day_counts']
                             .shift().fillna(0))
print(df)

输出:

      customer_name    current_date  record_counts  current-day_counts  previous_day_count
0    Mark             2018_02_06                23                  15                 0.0
1    Mark             2018_02_09                65                  42                15.0
2    Mark             2018_02_12                 7                  33                42.0
3    Mark             2018_02_21                36                  82                33.0
4    Mark             2018_02_27                43                  72                82.0
5    Bob              2018_02_02                56                  76                 0.0
6    Bob              2018_02_23                77                  11                76.0
7    Bob              2018_03_04                35                  59                11.0
8    Bob              2018_03_13                34                  68                59.0
9    Shawn            2018_02_11                75                  71                 0.0
10   Shawn            2018_02_15                26                  39                71.0
11   Shawn            2018_02_18                73                  65                39.0
12   Shawn            2018_02_24                87                  38                65.0