我正在创建一个函数来更轻松地操作相似的数据集,但是由于某种原因,该函数没有为我的数据帧重新编制索引。有人可以告诉我怎么回事吗?我试图弄清楚如何重新索引和插值数据,并且想知道为什么它会停在那里。
代码:
import pandas as pd
data2.rename(columns={'DATE':'DATE','DGS20':'Yd'},inplace = True)
data.rename(columns={'DATE':'DATE','DGS10':'Yd'},inplace = True)
def func(dat):
dat.DATE = pd.to_datetime(dat.DATE)
dat.Yd = pd.to_numeric(dat.Yd,errors = "coerce")
dat.index = dat.DATE
dat.drop('DATE',axis = 1,inplace = True)
scale = pd.date_range(start = data.index[0],end = data.index[3774],freq = 'D')
dat = dat.reindex(scale) <--- THIS LINE IS NOT EXECUTING
dat.interpolate(method = 'time',inplace = True)
结果:
该功能有效,但操作已停止在我上面指出的那一行。
数据示例:
DATE,DGS5
2004-01-02,3.36
2004-01-05,3.39
2004-01-06,3.26
2004-01-07,3.25
2004-01-08,3.24
2004-01-09,3.05
2004-01-12,3.04
2004-01-13,2.98
2004-01-14,2.96
2004-01-15,2.97
2004-01-16,3.03
2004-01-19,.
2004-01-20,3.05
2004-01-21,3.02
2004-01-22,2.96
2004-01-23,3.06
2004-01-26,3.13
2004-01-27,3.07
2004-01-28,3.22
2004-01-29,3.22
2004-01-30,3.17
2004-02-02,3.18
2004-02-03,3.12
2004-02-04,3.15
2004-02-05,3.21
2004-02-06,3.12
2004-02-09,3.08
2004-02-10,3.13
2004-02-11,3.03
2004-02-12,3.07
2004-02-13,3.01
2004-02-16,.
2004-02-17,3.02
2004-02-18,3.03
2004-02-19,3.02
2004-02-20,3.08
2004-02-23,3.03
2004-02-24,3.01
2004-02-25,2.98
2004-02-26,3.01
2004-02-27,3.01
2004-03-01,2.98
2004-03-02,3.04
2004-03-03,3.06
2004-03-04,3.02
2004-03-05,2.81
2004-03-08,2.74
2004-03-09,2.68
2004-03-10,2.71
2004-03-11,2.72
2004-03-12,2.73
2004-03-15,2.74
2004-03-16,2.65
2004-03-17,2.66
2004-03-18,2.72
2004-03-19,2.75
2004-03-22,2.69
2004-03-23,2.69
2004-03-24,2.68
2004-03-25,2.70
2004-03-26,2.81
2004-03-29,2.86
2004-03-30,2.86
2004-03-31,2.80
2004-04-01,2.87
2004-04-02,3.15
2004-04-05,3.24
2004-04-06,3.19
2004-04-07,3.19
2004-04-08,3.22
2004-04-09,.
2004-04-12,3.26
2004-04-13,3.37
2004-04-14,3.44
2004-04-15,3.45
2004-04-16,3.39
2004-04-19,3.42
2004-04-20,3.45
2004-04-21,3.52
2004-04-22,3.46
2004-04-23,3.58
2004-04-26,3.57
2004-04-27,3.52
2004-04-28,3.60
2004-04-29,3.66
2004-04-30,3.63
2004-05-03,3.63
2004-05-04,3.66
2004-05-05,3.71
2004-05-06,3.72
2004-05-07,3.96
2004-05-10,3.95
2004-05-11,3.94
2004-05-12,3.96
2004-05-13,4.01
2004-05-14,3.92
2004-05-17,3.83
2004-05-18,3.87
2004-05-19,3.93
2004-05-20,3.86
2004-05-21,3.91
2004-05-24,3.90
2004-05-25,3.89
2004-05-26,3.81
2004-05-27,3.74
2004-05-28,3.81
2004-05-31,.
2004-06-01,3.86
2004-06-02,3.91
2004-06-03,3.89
2004-06-04,3.97
2004-06-07,3.95
2004-06-08,3.96
2004-06-09,4.01
2004-06-10,4.00
2004-06-11,.
2004-06-14,4.10
2004-06-15,3.90
2004-06-16,3.96
2004-06-17,3.93
2004-06-18,3.94
2004-06-21,3.91
2004-06-22,3.92
2004-06-23,3.90
2004-06-24,3.85
2004-06-25,3.85
2004-06-28,3.97
2004-06-29,3.92
2004-06-30,3.81
2004-07-01,3.74
2004-07-02,3.62
2004-07-05,.
2004-07-06,3.65
答案 0 :(得分:0)
来自v0.23.4 docs:
DataFrame.reindex支持两种调用约定
(index=index_labels, columns=column_labels, ...) (labels, axis={'index', 'columns'}, ...)
我们高度建议使用关键字参数来阐明您的意图。
编辑:以下代码对我有用。我在函数中添加了return
语句。
import pandas as pd
raw_series = {'Yd': [3.36, 3.39, 3.26, 3.25, 3.24, 3.05, 3.04, 2.98, 2.96, 2.97, 3.03, '.']}
raw_index = ['2004-01-02', '2004-01-05', '2004-01-06', '2004-01-07', '2004-01-08', '2004-01-09', '2004-01-12', '2004-01-13', '2004-01-14', '2004-01-15', '2004-01-16', '2004-01-19']
dat = pd.DataFrame(raw_series, index=raw_index)
def func(dat):
dat.loc[:, 'Yd'] = pd.to_numeric(dat['Yd'], errors="coerce")
dat.index = pd.to_datetime(dat.index)
scale = pd.date_range(raw_index[0], raw_index[-1], freq='D')
reindexed = dat.reindex(index=scale)
return reindexed.interpolate(method='time')
输出:
Yd
2004-01-02 3.360000
2004-01-03 3.370000
2004-01-04 3.380000
2004-01-05 3.390000
2004-01-06 3.260000
2004-01-07 3.250000
2004-01-08 3.240000
2004-01-09 3.050000
2004-01-10 3.046667
2004-01-11 3.043333
2004-01-12 3.040000
2004-01-13 2.980000
2004-01-14 2.960000
2004-01-15 2.970000
2004-01-16 3.030000
2004-01-17 3.035000
2004-01-18 3.040000
2004-01-19 3.045000
2004-01-20 3.050000
验证数据类型:
>>>func(dat).reset_index().dtypes
index datetime64[ns]
Yd float64
dtype: object