熊猫不会让我重新索引吗?

时间:2018-08-20 03:47:07

标签: python pandas

我正在创建一个函数来更轻松地操作相似的数据集,但是由于某种原因,该函数没有为我的数据帧重新编制索引。有人可以告诉我怎么回事吗?我试图弄清楚如何重新索引和插值数据,并且想知道为什么它会停在那里。

代码:

import pandas as pd
data2.rename(columns={'DATE':'DATE','DGS20':'Yd'},inplace = True)
data.rename(columns={'DATE':'DATE','DGS10':'Yd'},inplace = True)

def func(dat):

    dat.DATE = pd.to_datetime(dat.DATE)
    dat.Yd = pd.to_numeric(dat.Yd,errors = "coerce")

    dat.index = dat.DATE
    dat.drop('DATE',axis = 1,inplace = True)

    scale = pd.date_range(start = data.index[0],end = data.index[3774],freq = 'D') 
    dat = dat.reindex(scale) <--- THIS LINE IS NOT EXECUTING

    dat.interpolate(method = 'time',inplace = True)

结果:

该功能有效,但操作已停止在我上面指出的那一行。

数据示例:

DATE,DGS5
2004-01-02,3.36
2004-01-05,3.39
2004-01-06,3.26
2004-01-07,3.25
2004-01-08,3.24
2004-01-09,3.05
2004-01-12,3.04
2004-01-13,2.98
2004-01-14,2.96
2004-01-15,2.97
2004-01-16,3.03
2004-01-19,.
2004-01-20,3.05
2004-01-21,3.02
2004-01-22,2.96
2004-01-23,3.06
2004-01-26,3.13
2004-01-27,3.07
2004-01-28,3.22
2004-01-29,3.22
2004-01-30,3.17
2004-02-02,3.18
2004-02-03,3.12
2004-02-04,3.15
2004-02-05,3.21
2004-02-06,3.12
2004-02-09,3.08
2004-02-10,3.13
2004-02-11,3.03
2004-02-12,3.07
2004-02-13,3.01
2004-02-16,.
2004-02-17,3.02
2004-02-18,3.03
2004-02-19,3.02
2004-02-20,3.08
2004-02-23,3.03
2004-02-24,3.01
2004-02-25,2.98
2004-02-26,3.01
2004-02-27,3.01
2004-03-01,2.98
2004-03-02,3.04
2004-03-03,3.06
2004-03-04,3.02
2004-03-05,2.81
2004-03-08,2.74
2004-03-09,2.68
2004-03-10,2.71
2004-03-11,2.72
2004-03-12,2.73
2004-03-15,2.74
2004-03-16,2.65
2004-03-17,2.66
2004-03-18,2.72
2004-03-19,2.75
2004-03-22,2.69
2004-03-23,2.69
2004-03-24,2.68
2004-03-25,2.70
2004-03-26,2.81
2004-03-29,2.86
2004-03-30,2.86
2004-03-31,2.80
2004-04-01,2.87
2004-04-02,3.15
2004-04-05,3.24
2004-04-06,3.19
2004-04-07,3.19
2004-04-08,3.22
2004-04-09,.
2004-04-12,3.26
2004-04-13,3.37
2004-04-14,3.44
2004-04-15,3.45
2004-04-16,3.39
2004-04-19,3.42
2004-04-20,3.45
2004-04-21,3.52
2004-04-22,3.46
2004-04-23,3.58
2004-04-26,3.57
2004-04-27,3.52
2004-04-28,3.60
2004-04-29,3.66
2004-04-30,3.63
2004-05-03,3.63
2004-05-04,3.66
2004-05-05,3.71
2004-05-06,3.72
2004-05-07,3.96
2004-05-10,3.95
2004-05-11,3.94
2004-05-12,3.96
2004-05-13,4.01
2004-05-14,3.92
2004-05-17,3.83
2004-05-18,3.87
2004-05-19,3.93
2004-05-20,3.86
2004-05-21,3.91
2004-05-24,3.90
2004-05-25,3.89
2004-05-26,3.81
2004-05-27,3.74
2004-05-28,3.81
2004-05-31,.
2004-06-01,3.86
2004-06-02,3.91
2004-06-03,3.89
2004-06-04,3.97
2004-06-07,3.95
2004-06-08,3.96
2004-06-09,4.01
2004-06-10,4.00
2004-06-11,.
2004-06-14,4.10
2004-06-15,3.90
2004-06-16,3.96
2004-06-17,3.93
2004-06-18,3.94
2004-06-21,3.91
2004-06-22,3.92
2004-06-23,3.90
2004-06-24,3.85
2004-06-25,3.85
2004-06-28,3.97
2004-06-29,3.92
2004-06-30,3.81
2004-07-01,3.74
2004-07-02,3.62
2004-07-05,.
2004-07-06,3.65

1 个答案:

答案 0 :(得分:0)

来自v0.23.4 docs

  

DataFrame.reindex支持两种调用约定    (index=index_labels, columns=column_labels, ...) (labels, axis={'index', 'columns'}, ...)

     

我们高度建议使用关键字参数来阐明您的意图。

编辑:以下代码对我有用。我在函数中添加了return语句。

import pandas as pd

raw_series = {'Yd': [3.36, 3.39, 3.26, 3.25, 3.24, 3.05, 3.04, 2.98, 2.96, 2.97, 3.03, '.']}
raw_index = ['2004-01-02', '2004-01-05', '2004-01-06', '2004-01-07', '2004-01-08', '2004-01-09', '2004-01-12', '2004-01-13', '2004-01-14', '2004-01-15', '2004-01-16', '2004-01-19']

dat = pd.DataFrame(raw_series, index=raw_index)

def func(dat):
    dat.loc[:, 'Yd'] = pd.to_numeric(dat['Yd'], errors="coerce")
    dat.index = pd.to_datetime(dat.index)

    scale = pd.date_range(raw_index[0], raw_index[-1], freq='D')
    reindexed = dat.reindex(index=scale)
    return reindexed.interpolate(method='time')

输出:

            Yd
2004-01-02  3.360000
2004-01-03  3.370000
2004-01-04  3.380000
2004-01-05  3.390000
2004-01-06  3.260000
2004-01-07  3.250000
2004-01-08  3.240000
2004-01-09  3.050000
2004-01-10  3.046667
2004-01-11  3.043333
2004-01-12  3.040000
2004-01-13  2.980000
2004-01-14  2.960000
2004-01-15  2.970000
2004-01-16  3.030000
2004-01-17  3.035000
2004-01-18  3.040000
2004-01-19  3.045000
2004-01-20  3.050000

验证数据类型:

>>>func(dat).reset_index().dtypes

index    datetime64[ns]
Yd              float64
dtype: object