我有一个要在一周内重新采样的数据框:
df =
Date Game_Mode Count
0 2008-11-30 b 1
1 2009-07-03 b 1
2 2009-07-12 b 1
3 2009-07-18 b 1
4 2009-10-02 c 1
5 2009-10-21 a 1
6 2009-10-22 b 1
7 2010-01-29 b 1
8 2010-01-31 b 1
9 2010-02-28 a 1
10 2010-03-28 a 1
11 2010-04-16 a 1
12 2010-05-09 a 1
13 2010-07-07 a 1
14 2010-09-16 e 1
15 2010-10-26 e 1
16 2010-12-16 e 1
17 2010-12-22 e 1
18 2011-07-20 e 1
19 2011-08-23 e 1
df['Date'][0]
Timestamp('2008-11-30 00:00:00')
我每周对类别'a'
(在所有类别中都这样做)进行重新采样。
week = df[df['Game_Mode'] == 'a'].set_index('Date', drop=True).resample('1W-TUE').sum()['Count'].reset_index()
# wc.dropna(subset=['Count'], inplace=True)
week.reset_index(drop=True, inplace=True)
week['Date_Week_{}'.format('a')] = week['Date']
week['Index_Col_{}'.format('a')] = week.index + 1
week.rename(columns={'Count':'Count_{}'.format('a')}, inplace=True)
每周重采样:
week
Date Count_a Date_Week_a Index_Col_a
0 2009-10-27 1.0 2009-10-27 1
1 2009-11-03 NaN 2009-11-03 2
2 2009-11-10 NaN 2009-11-10 3
3 2009-11-17 NaN 2009-11-17 4
4 2009-11-24 NaN 2009-11-24 5
5 2009-12-01 NaN 2009-12-01 6
6 2009-12-08 NaN 2009-12-08 7
7 2009-12-15 NaN 2009-12-15 8
8 2009-12-22 NaN 2009-12-22 9
9 2009-12-29 NaN 2009-12-29 10
10 2010-01-05 NaN 2010-01-05 11
11 2010-01-12 NaN 2010-01-12 12
12 2010-01-19 NaN 2010-01-19 13
13 2010-01-26 NaN 2010-01-26 14
14 2010-02-02 NaN 2010-02-02 15
15 2010-02-09 NaN 2010-02-09 16
16 2010-02-16 NaN 2010-02-16 17
17 2010-02-23 NaN 2010-02-23 18
18 2010-03-02 1.0 2010-03-02 19
19 2010-03-09 NaN 2010-03-09 20
20 2010-03-16 NaN 2010-03-16 21
21 2010-03-23 NaN 2010-03-23 22
22 2010-03-30 1.0 2010-03-30 23
23 2010-04-06 NaN 2010-04-06 24
24 2010-04-13 NaN 2010-04-13 25
25 2010-04-20 1.0 2010-04-20 26
26 2010-04-27 NaN 2010-04-27 27
27 2010-05-04 NaN 2010-05-04 28
28 2010-05-11 1.0 2010-05-11 29
29 2010-05-18 NaN 2010-05-18 30
30 2010-05-25 NaN 2010-05-25 31
31 2010-06-01 NaN 2010-06-01 32
32 2010-06-08 NaN 2010-06-08 33
33 2010-06-15 NaN 2010-06-15 34
34 2010-06-22 NaN 2010-06-22 35
35 2010-06-29 NaN 2010-06-29 36
36 2010-07-06 NaN 2010-07-06 37
37 2010-07-13 1.0 2010-07-13 38
这是我的问题。我丢失了2009-10-27
至2008-12-30
以及2010-07-13
至2011-08-23
至所有日期。在重采样期间如何丢失这些东西?
我想结束:
week
Date Count_e Date_Week_e Index_Col_e
2008-12-02 NaN 2008-12-02 1
2008-12-09 NaN 2008-12-09 2
... # All weeks before 2009-10-27
...
...
2009-10-27 1.0 2009-10-27 X
2009-11-03 NaN 2009-11-03 Y
...
...
...
# Standard resample in this period
2010-07-06 NaN 2010-07-06 Z
2010-07-13 1.0 2010-07-13 I
...
...
... # All weeks after 2010-07-13 up to:
2011-08-23 NaN 2011-08-23 J
答案 0 :(得分:1)
问题是由
引起的df[df['Game_Mode'] == 'a']
如果仅选择Game_Mode
为a
(或其中之一)的df行,那么您将舍弃开始和结束日期。
您可以做的是创建一个空的数据框,该数据框的日期相同,但充满了NaN。例如
import numpy as np
temp = pd.DataFrame({'Date' : df['Date'],'Game_Mode' : 'a', 'Count': np.nan})
礼物:
Date Game_Mode Count
0 2008-11-30 a NaN
1 2009-07-03 a NaN
2 2009-07-12 a NaN
3 2009-07-18 a NaN
4 2009-10-02 a NaN
5 2009-10-21 a NaN
6 2009-10-22 a NaN
7 2010-01-29 a NaN
8 2010-01-31 a NaN
9 2010-02-28 a NaN
10 2010-03-28 a NaN
11 2010-04-16 a NaN
12 2010-05-09 a NaN
13 2010-07-07 a NaN
14 2010-09-16 a NaN
15 2010-10-26 a NaN
16 2010-12-16 a NaN
17 2010-12-22 a NaN
18 2011-07-20 a NaN
19 2011-08-23 a NaN
然后使用现有数据更新它(奇怪的是,无法内联):
temp.update(df[df['Game_Mode']=='a'])
礼物:
Date Game_Mode Count
0 2008-11-30 a NaN
1 2009-07-03 a NaN
2 2009-07-12 a NaN
3 2009-07-18 a NaN
4 2009-10-02 a NaN
5 2009-10-21 a 1.0
6 2009-10-22 a NaN
7 2010-01-29 a NaN
8 2010-01-31 a NaN
9 2010-02-28 a 1.0
10 2010-03-28 a 1.0
11 2010-04-16 a 1.0
12 2010-05-09 a 1.0
13 2010-07-07 a 1.0
14 2010-09-16 a NaN
15 2010-10-26 a NaN
16 2010-12-16 a NaN
17 2010-12-22 a NaN
18 2011-07-20 a NaN
19 2011-08-23 a NaN
如果您随后重新采样:
temp.set_index('Date').resample('1W-TUE').sum()['Count']
您获得了所有日期(.sum()
不会为我返回NaN ...):
Date
2008-12-02 0.0
2008-12-09 0.0
2008-12-16 0.0
2008-12-23 0.0
..
2011-08-02 0.0
2011-08-09 0.0
2011-08-16 0.0
2011-08-23 0.0
Freq: W-TUE, Name: Count, Length: 143, dtype: float64