pandas datetimeindex asfreq没有合作?

时间:2018-04-04 11:36:24

标签: datetime dataframe

此示例代码试图模仿来自sqlalchemy和mysql / mariadb数据库的Pandas DataFrame。当试图通过数据框从该数据库中获取浮点数和整数数据时,我可以成功地使用df.asfreq来引用一列并获取另一列的np.nan / NaN,然后​​我可以使插值或多项式填充在数据中。然而,对于像“2005-09-29 15:27:00”类型的“datetime”(通用术语)的时间序列,这似乎是不可能的。

已使用“.astype('datetime64 [ns]')”和“to_datetime”进行测试。

我所拥有的是一个或多个缺少“日期时间”的数据(这里是['recdate']列的形式,我希望['outdoortemperature']列用NaN填充。我无法填充任何内容方法适合我。

是的,我花了几天时间尝试不同的方法和文档,包括查看有关ML的三本不同书籍!

#-*- coding: utf-8 -*-

import pandas as pd
'''
Python version '3.6.4 |Anaconda, Inc.| (default, Jan 16 2018, 10:22:32) [MSC v.1900 64 bit (AMD64)]'
Running Spyder IDE version 3.2.6
PANDAS VERSION '0.22.0'
'''
# Note: missing minute data at 15:28:00 
TS = pd.DataFrame({'recdate': [ '2005-09-28 15:27:00', '2005-09-28 
15:29:00'],
    'outdoortemperature': [12.778, 12.833] })
# Also tested:
# TS['recdate'] = TS['recdate'].astype('datetime64[ns]')
# TS['recdate'] = TS['recdate'].to_datetime()


print("step 1. TS.dtypes: ",TS.dtypes)

TS.set_index(['recdate']) # does not affect the result?
# tested variations - same result:
#TS['recdate'] = TS['recdate'].asfreq('1min')

#TS.index =pd.to_datetime(TS.index)
TS.index =TS.set_index(pd.DatetimeIndex(TS['recdate']))
TS['recdate'] = TS['recdate'].asfreq('1T')  
print(TS.recdate)


print("step 2. TS.dtypes: ",TS.dtypes)

print(TS)

我认为可能的是:

enter code here
Gives:
       doortemperature recdate
0               12.778     NaT
1               12.833     NaT


Expected something like:
...
     0  12.778   2005-09-28 15:28:00 
     1  NaN      2005-09-28 15:28:00  --- New added datetime?!?
     2  12.778   2005-09-28 15:29:00
...

只是这段代码完成了我所期望的大部分内容,但第26行(?)和.asfreq(1T')会产生错误:

TypeError: Cannot convert input [(12.778, '2005-09-28 15:27:00')] of type <class 'tuple'> to Timestamp 

1 个答案:

答案 0 :(得分:0)

我最初对大熊猫和旧示例代码的文档感到困惑。由于事情变化如此之快,许多旧的示例代码将无法正常工作。这与我最初提供的样本数据不完全相同,但是此示例将我自己的代码更进一步 - 只需几行。

# -*- coding: utf-8 -*-

import pandas as pd

# Python version
# '3.6.4 |Anaconda, Inc.| (default, Jan 16 2018, 10:22:32) [MSC v.1900 64 
#    bit (AMD64)]'
# Running Spyder IDE version 3.2.6
# PANDAS VERSION '0.22.0'

# Note missing minutes data at around 15:28:00 
TS = pd.DataFrame({'recdate': ['2005-09-28 15:22:00', '2005-09-28 15:23:00',
      '2005-09-28 15:24:00', '2005-09-28 15:25:00', '2005-09-28 15:26:00',
      '2005-09-28 15:29:00', '2005-09-28 15:30:00', '2005-09-28 15:31:00',
      '2005-09-28 15:32:00', '2005-09-28 15:33:00',
      '2005-09-28 15:34:00'],
      'outdoortemperature': [12.611, 12.611, 12.611, 12.611, 12.611, 
                             12.833, 12.833, 12.833, 12.944, 12.944, 12.944]
  })

 #format = '%Y-%m-%d %H:%M:%S'
 #TS['recdate'] = pd.to_datetime(TS['recdate'], format=format)
 TS['recdate'] = pd.to_datetime(TS['recdate'])
 TS = TS.set_index(pd.DatetimeIndex(TS['recdate']))

 print(TS)
 print(TS.dtypes)

 TS = TS.resample('1Min').mean()

 print(TS)
 print(TS.dtypes)

 TS['outdoortemperature'] = TS['outdoortemperature'].interpolate(method = 
                         'spline', order=2 )
 print(TS)

最后两步和打印(TS):es会产生以下内容,这就是我想要的!:

                     outdoortemperature
recdate                                
2005-09-28 15:22:00              12.611
2005-09-28 15:23:00              12.611
2005-09-28 15:24:00              12.611
2005-09-28 15:25:00              12.611
2005-09-28 15:26:00              12.611
2005-09-28 15:27:00                 NaN
2005-09-28 15:28:00                 NaN
2005-09-28 15:29:00              12.833
2005-09-28 15:30:00              12.833
2005-09-28 15:31:00              12.833
2005-09-28 15:32:00              12.944
2005-09-28 15:33:00              12.944
2005-09-28 15:34:00              12.944
outdoortemperature    float64
dtype: object
                     outdoortemperature
recdate                                
2005-09-28 15:22:00           12.611000
2005-09-28 15:23:00           12.611000
2005-09-28 15:24:00           12.611000
2005-09-28 15:25:00           12.611000
2005-09-28 15:26:00           12.611000
2005-09-28 15:27:00           12.706501
2005-09-28 15:28:00           12.739543
2005-09-28 15:29:00           12.833000
2005-09-28 15:30:00           12.833000
2005-09-28 15:31:00           12.833000
2005-09-28 15:32:00           12.944000
2005-09-28 15:33:00           12.944000
2005-09-28 15:34:00           12.944000

您可以轻松查看此示例输出中的插值以获取额外值 十进制精度。