如何在不丢失行的情况下在pandas中使用重新取样

时间:2018-03-14 03:55:01

标签: python pandas

以下代码有效但我丢失了一个月内所有日期的数据。如何在执行下面的代码并在 monthly_return 列中添加值后返回数据框。

df = df.resample('M', on='Date').last()
df['monthly_return'] = df['Close'] / df['Close'].shift(1)

使用df.groupby(pd.TimeGrouper(freq='M')还有其他方法吗?提前谢谢。

数据:

Date    Open    High    Low Last    Close   Total Trade Quantity    Turnover (Lacs)
31/12/10    1058.05 1066.5  1053.2  1058    1058.7  2279759 24175.91
03/01/11    1065    1066.5  1052.6  1054.4  1055.6  2379838 25219.9
04/01/11    1060    1079.9  1057.3  1076    1077.1  5033831 53836.72
05/01/11    1080    1090    1070    1075.1  1075.8  5930189 64130.29
06/01/11    1078    1091.4  1074.55 1084.75 1085.6  4974697 53863.56
07/01/11    1081.3  1087.75 1058    1067    1065.4  4085949 43697.97
10/01/11    1065.4  1070.45 1021.2  1035.35 1033.45 4223855 44380.55
11/01/11    1040    1048.7  998.6   1022    1013.75 4833870 49610.25
12/01/11    1020.65 1035.65 1003    1029.7  1030.8  5301987 54062.12
13/01/11    1030    1039.15 1011.1  1015.95 1015.35 5562896 56929.7
14/01/11    1014.05 1029.8  998 998.15  1001.5  4368638 44237.72
17/01/11    1003    1008    992.05  998.95  997.9   3210798 32112.23
18/01/11    1004.85 1008    990 996.9   994.85  3399381 33878.35
19/01/11    999.6   1005    975 981.4   980.15  5369775 53336.13
20/01/11    977.6   977.6   954.55  968.65  969.85  9450384 91371.68
21/01/11    975 992.5   973 987.2   986.8   4463927 43974.77
24/01/11    995.1   995.9   964.1   972.55  971.6   5084310 49504.38
25/01/11    977.1   984.95  956.25  958.4   958.5   4646080 45081.81
27/01/11    966 967.4   940 943.85  942.7   6424365 61204.92
28/01/11    945 945.9   902 917.8   913 7738844 71176.94
31/01/11    908.85  928.85  903.1   919.8   919.3   5790572 53180.4
01/02/11    925.1   927.7   888.55  894.6   895.5   8663045 78236.21
02/02/11    910 927.9   907.3   923 921.3   6506784 59829.88
03/02/11    928 947.5   915 947 943.9   3896824 36410.66
04/02/11    947.4   947.4   913.15  917 920 4735179 43902.54
07/02/11    925.7   938.8   914.4   928.9   929.05  3334195 30906.7
08/02/11    935.7   935.7   911.25  917 915.25  4499736 41345.23
09/02/11    908.4   937.2   903.65  919.75  914.3   8738283 80670.79
10/02/11    912.5   920.8   885.1   899 899.65  6479831 58339.8
11/02/11    899.95  915 888.05  910.75  909.95  4574486 41162.7
14/02/11    906.1   918.25  894.1   915.1   914.8   6890857 62528.62
15/02/11    914.4   944.8   911.3   942 941.7   6232645 57994.82
16/02/11    938.3   950.8   938.1   943.7   944.15  2523913 23879.17
17/02/11    947.85  955.45  936.25  953.95  953.95  3221065 30533.42
18/02/11    956.05  963 932 938 935.55  3599714 34150.98
21/02/11    936.1   960.9   934.2   954.65  956.35  5858555 55598.75
22/02/11    992 1009.4  979.25  986.3   984.85  20326289    202368.02
23/02/11    986.4   1002.3  985 998.5   995.65  7128252 70947.52
24/02/11    996 998.6   958.35  971 964.2   7521221 73065.81
25/02/11    980.7   982.25  954.55  969.95  966.25  3451930 33449.49
28/02/11    970.2   994.3   960 961.7   964.25  5140248 50326.13
01/03/11    973.7   992.7   967.2   991.3   988.85  2885282 28353.82
03/03/11    975.2   985 968.1   980 977.4   4461643 43592.82
04/03/11    984.65  994.8   978.1   983 982.2   3524743 34728.27
07/03/11    974.1   983.3   964 977 977.05  2702156 26295.27
08/03/11    978 992.4   978 985.1   984.95  2756185 27163.27
09/03/11    988.7   1004.4  968.1   993 993.85  7422840 73624.58
10/03/11    992.7   993 982 984.8   984.75  2252418 22224.76
11/03/11    978.8   994.9   972.15  993.1   992.05  3428848 33771.82
14/03/11    992.95  1020.75 992.25  1020    1018.35 6585771 66545.84
15/03/11    997.95  1049.6  988.9   1036    1037.25 12902924    132391.86
16/03/11    1045    1055    1038.1  1046.5  1047.1  4709984 49221.38
17/03/11    1039.3  1048.8  1028    1032.05 1030.85 2830269 29364.37
18/03/11    1035    1035    986.6   990.45  993 9258050 92498.4
21/03/11    999.05  1000    981.1   993 990.3   3553732 35223.25
22/03/11    993.05  1004.9  992.15  1000    999.65  2238646 22389.08
23/03/11    999.85  1016.65 995.8   1015.1  1013.25 2610417 26324.04
24/03/11    1015.9  1021.95 1003.85 1012.4  1010.3  2567887 25984.59
25/03/11    1023.9  1029.95 1010    1026.35 1026.6  3518453 35885.2
28/03/11    1026.25 1036.35 1014.3  1025.4  1025.7  4361259 44657.08
29/03/11    1020    1032    1019.6  1025.2  1022.75 4137811 42491.41
30/03/11    1027.5  1037.5  1024.1  1032.15 1032.5  2801842 28911.45
31/03/11    1034.5  1054.9  1034.5  1051.55 1049.1  7201048 75289.45
01/04/11    1049.05 1065.9  1031    1036    1036.4  4291094 44800.34
04/04/11    1040    1054.8  1029    1050.8  1050.65 2997121 31243.65
05/04/11    1052.5  1059    1040    1048    1047.65 2458823 25770.85
06/04/11    1045    1054.25 1040    1047.25 1044.85 2076283 21693.06
07/04/11    1045.05 1047.55 1035.1  1041.75 1041.9  2323242 24162.46
08/04/11    1042    1050.9  1020.65 1024    1023.9  2742024 28264.77
11/04/11    1013    1019.45 1003.65 1005.8  1005.3  2582208 26056.18
13/04/11    1001.85 1024    996.6   1022.45 1021.8  4369725 44281.79
15/04/11    1018.65 1024.85 1005    1019.05 1020.95 3240001 32890
18/04/11    1019.2  1044.7  1000.6  1007.35 1009.15 3381601 34677.12
19/04/11    1004.7  1017.45 998.75  1010.9  1011.65 2748246 27716.13
20/04/11    1016    1030.5  1008.2  1029    1025.9  2357930 24045.79
21/04/11    1034.8  1044.2  1030    1039.25 1040.6  3413753 35381.38
25/04/11    1014    1021    1005.15 1009    1009.35 4146270 41984.21
26/04/11    1009    1009    997.55  1001    1001.15 4035044 40429.5
27/04/11    1005    1009.1  980.1   987.7   986.1   5648346 55944.09
28/04/11    992 993 970 972.35  972.4   8163236 79766.58
29/04/11    975 988 971.3   985.2   983.75  3414200 33459
02/05/11    983.9   986.8   961.3   964.5   964.75  3110913 30176.96
03/05/11    965.1   967.4   941.05  943.2   943.9   5035348 47996.8
04/05/11    945 956.75  940 947 947.65  2667162 25318.17
05/05/11    945 963.55  943.65  946.7   949.95  4476339 42725.62
06/05/11    953 960 949.1   956.2   955.05  2952486 28174.73

预期输出:除月末之外的所有值应为0或为空,月末应将 df ['关闭'] 与上个月分开的 DF ['关闭']

1 个答案:

答案 0 :(得分:1)

我认为您需要map,但需要将匹配Date列的drop_duplicates列设置为月末,以便仅指定最后一行:

df['Date'] = pd.to_datetime(df['Date'],dayfirst=True)

df1 = df.resample('M', on='Date').last()
a = df1['Close'] / df1['Close'].shift(1)
print (a)
Date
2010-12-31         NaN
2011-01-31    0.869376
2011-02-28    1.045553
2011-03-31    1.093428
2011-04-30    0.936903
2011-05-31    0.970564
Freq: M, Name: Close, dtype: float64
d = df['Date'] + pd.offsets.MonthEnd(0)
df['new'] = d.drop_duplicates(keep='last').map(a)
print (df.tail(10))

         Date    Open     High  Low Last    Close    Total  Trade Quantity  \
76 2011-04-25  1014.0  1021.00   1005.15  1009.00  1009.35         4146270   
77 2011-04-26  1009.0  1009.00    997.55  1001.00  1001.15         4035044   
78 2011-04-27  1005.0  1009.10    980.10   987.70   986.10         5648346   
79 2011-04-28   992.0   993.00    970.00   972.35   972.40         8163236   
80 2011-04-29   975.0   988.00    971.30   985.20   983.75         3414200   
81 2011-05-02   983.9   986.80    961.30   964.50   964.75         3110913   
82 2011-05-03   965.1   967.40    941.05   943.20   943.90         5035348   
83 2011-05-04   945.0   956.75    940.00   947.00   947.65         2667162   
84 2011-05-05   945.0   963.55    943.65   946.70   949.95         4476339   
85 2011-05-06   953.0   960.00    949.10   956.20   955.05         2952486   

    Turnover (Lacs)       new  
76         41984.21       NaN  
77         40429.50       NaN  
78         55944.09       NaN  
79         79766.58       NaN  
80         33459.00  0.936903  
81         30176.96       NaN  
82         47996.80       NaN  
83         25318.17       NaN  
84         42725.62       NaN  
85         28174.73  0.970564