我有数据框,如
+---------------+--------------+---------------+-------------------+
| customer_info | counts |
+---------------+--------------+---------------+-------------------+
| customer_name | current_date | record_counts | current_day_count |
| Mark | 2018_02_06 | 23 | 15 |
| | 2018_02_09 | 65 | 42 |
| | 2018_02_12 | 7 | 33 |
| | 2018_02_21 | 36 | 82 |
| | 2018_02_27 | 43 | 72 |
| Bob | 2018_02_02 | 56 | 76 |
| | 2018_02_23 | 77 | 11 |
| | 2018_03_04 | 35 | 59 |
| | 2018_03_13 | 34 | 68 |
| Shawn | 2018_02_11 | 75 | 71 |
| | 2018_02_15 | 26 | 39 |
| | 2018_02_18 | 73 | 65 |
| | 2018_02_24 | 87 | 38 |
+---------------+--------------+---------------+-------------------+
现在我想重新整形上面的数据框,如下所示,我想要一个名为前一天计数的新列,其当前日数为previous day count
但是第0天或first day of customer value should be 0
以及所有客户名称已填充到位
+---------------+--------------+---------------+-------------------+--------------------+
| customer_name | current_date | record_counts | current_day_count | previous_day_count |
+---------------+--------------+---------------+-------------------+--------------------+
| Mark | 2018_02_06 | 23 | 15 | 0 |
| Mark | 2018_02_09 | 65 | 42 | 15 |
| Mark | 2018_02_12 | 7 | 33 | 42 |
| Mark | 2018_02_21 | 36 | 82 | 33 |
| Mark | 2018_02_27 | 43 | 72 | 82 |
| Bob | 2018_02_02 | 56 | 76 | 0 |
| Bob | 2018_02_23 | 77 | 11 | 76 |
| Bob | 2018_03_04 | 35 | 59 | 11 |
| Bob | 2018_03_13 | 34 | 68 | 59 |
| Shawn | 2018_02_11 | 75 | 71 | 0 |
| Shawn | 2018_02_15 | 26 | 39 | 71 |
| Shawn | 2018_02_18 | 73 | 65 | 39 |
| Shawn | 2018_02_24 | 87 | 38 | 65 |
+---------------+--------------+---------------+-------------------+--------------------+
答案 0 :(得分:2)
您的数据框看起来有点奇怪,但我假设您有所有四列的multiIndex列,并且行索引是默认的。
输入数据帧:
print(df)
customer_info counts
customer_name current_date record_counts current-day_counts
0 Mark 2018_02_06 23 15
1 Mark 2018_02_09 65 42
2 Mark 2018_02_12 7 33
3 Mark 2018_02_21 36 82
4 Mark 2018_02_27 43 72
5 Bob 2018_02_02 56 76
6 Bob 2018_02_23 77 11
7 Bob 2018_03_04 35 59
8 Bob 2018_03_13 34 68
9 Shawn 2018_02_11 75 71
10 Shawn 2018_02_15 26 39
11 Shawn 2018_02_18 73 65
12 Shawn 2018_02_24 87 38
df.columns = df.columns.droplevel(0)
df['previous_day_count'] = (df.groupby('customer_name')['current-day_counts']
.shift().fillna(0))
print(df)
输出:
customer_name current_date record_counts current-day_counts previous_day_count
0 Mark 2018_02_06 23 15 0.0
1 Mark 2018_02_09 65 42 15.0
2 Mark 2018_02_12 7 33 42.0
3 Mark 2018_02_21 36 82 33.0
4 Mark 2018_02_27 43 72 82.0
5 Bob 2018_02_02 56 76 0.0
6 Bob 2018_02_23 77 11 76.0
7 Bob 2018_03_04 35 59 11.0
8 Bob 2018_03_13 34 68 59.0
9 Shawn 2018_02_11 75 71 0.0
10 Shawn 2018_02_15 26 39 71.0
11 Shawn 2018_02_18 73 65 39.0
12 Shawn 2018_02_24 87 38 65.0