此示例代码试图模仿来自sqlalchemy和mysql / mariadb数据库的Pandas DataFrame。当试图通过数据框从该数据库中获取浮点数和整数数据时,我可以成功地使用df.asfreq来引用一列并获取另一列的np.nan / NaN,然后我可以使插值或多项式填充在数据中。然而,对于像“2005-09-29 15:27:00”类型的“datetime”(通用术语)的时间序列,这似乎是不可能的。
已使用“.astype('datetime64 [ns]')”和“to_datetime”进行测试。
我所拥有的是一个或多个缺少“日期时间”的数据(这里是['recdate']列的形式,我希望['outdoortemperature']列用NaN填充。我无法填充任何内容方法适合我。
是的,我花了几天时间尝试不同的方法和文档,包括查看有关ML的三本不同书籍!
#-*- coding: utf-8 -*-
import pandas as pd
'''
Python version '3.6.4 |Anaconda, Inc.| (default, Jan 16 2018, 10:22:32) [MSC v.1900 64 bit (AMD64)]'
Running Spyder IDE version 3.2.6
PANDAS VERSION '0.22.0'
'''
# Note: missing minute data at 15:28:00
TS = pd.DataFrame({'recdate': [ '2005-09-28 15:27:00', '2005-09-28
15:29:00'],
'outdoortemperature': [12.778, 12.833] })
# Also tested:
# TS['recdate'] = TS['recdate'].astype('datetime64[ns]')
# TS['recdate'] = TS['recdate'].to_datetime()
print("step 1. TS.dtypes: ",TS.dtypes)
TS.set_index(['recdate']) # does not affect the result?
# tested variations - same result:
#TS['recdate'] = TS['recdate'].asfreq('1min')
#TS.index =pd.to_datetime(TS.index)
TS.index =TS.set_index(pd.DatetimeIndex(TS['recdate']))
TS['recdate'] = TS['recdate'].asfreq('1T')
print(TS.recdate)
print("step 2. TS.dtypes: ",TS.dtypes)
print(TS)
我认为可能的是:
enter code here
Gives:
doortemperature recdate
0 12.778 NaT
1 12.833 NaT
Expected something like:
...
0 12.778 2005-09-28 15:28:00
1 NaN 2005-09-28 15:28:00 --- New added datetime?!?
2 12.778 2005-09-28 15:29:00
...
只是这段代码完成了我所期望的大部分内容,但第26行(?)和.asfreq(1T')会产生错误:
TypeError: Cannot convert input [(12.778, '2005-09-28 15:27:00')] of type <class 'tuple'> to Timestamp
答案 0 :(得分:0)
我最初对大熊猫和旧示例代码的文档感到困惑。由于事情变化如此之快,许多旧的示例代码将无法正常工作。这与我最初提供的样本数据不完全相同,但是此示例将我自己的代码更进一步 - 只需几行。
# -*- coding: utf-8 -*-
import pandas as pd
# Python version
# '3.6.4 |Anaconda, Inc.| (default, Jan 16 2018, 10:22:32) [MSC v.1900 64
# bit (AMD64)]'
# Running Spyder IDE version 3.2.6
# PANDAS VERSION '0.22.0'
# Note missing minutes data at around 15:28:00
TS = pd.DataFrame({'recdate': ['2005-09-28 15:22:00', '2005-09-28 15:23:00',
'2005-09-28 15:24:00', '2005-09-28 15:25:00', '2005-09-28 15:26:00',
'2005-09-28 15:29:00', '2005-09-28 15:30:00', '2005-09-28 15:31:00',
'2005-09-28 15:32:00', '2005-09-28 15:33:00',
'2005-09-28 15:34:00'],
'outdoortemperature': [12.611, 12.611, 12.611, 12.611, 12.611,
12.833, 12.833, 12.833, 12.944, 12.944, 12.944]
})
#format = '%Y-%m-%d %H:%M:%S'
#TS['recdate'] = pd.to_datetime(TS['recdate'], format=format)
TS['recdate'] = pd.to_datetime(TS['recdate'])
TS = TS.set_index(pd.DatetimeIndex(TS['recdate']))
print(TS)
print(TS.dtypes)
TS = TS.resample('1Min').mean()
print(TS)
print(TS.dtypes)
TS['outdoortemperature'] = TS['outdoortemperature'].interpolate(method =
'spline', order=2 )
print(TS)
最后两步和打印(TS):es会产生以下内容,这就是我想要的!:
outdoortemperature
recdate
2005-09-28 15:22:00 12.611
2005-09-28 15:23:00 12.611
2005-09-28 15:24:00 12.611
2005-09-28 15:25:00 12.611
2005-09-28 15:26:00 12.611
2005-09-28 15:27:00 NaN
2005-09-28 15:28:00 NaN
2005-09-28 15:29:00 12.833
2005-09-28 15:30:00 12.833
2005-09-28 15:31:00 12.833
2005-09-28 15:32:00 12.944
2005-09-28 15:33:00 12.944
2005-09-28 15:34:00 12.944
outdoortemperature float64
dtype: object
outdoortemperature
recdate
2005-09-28 15:22:00 12.611000
2005-09-28 15:23:00 12.611000
2005-09-28 15:24:00 12.611000
2005-09-28 15:25:00 12.611000
2005-09-28 15:26:00 12.611000
2005-09-28 15:27:00 12.706501
2005-09-28 15:28:00 12.739543
2005-09-28 15:29:00 12.833000
2005-09-28 15:30:00 12.833000
2005-09-28 15:31:00 12.833000
2005-09-28 15:32:00 12.944000
2005-09-28 15:33:00 12.944000
2005-09-28 15:34:00 12.944000
您可以轻松查看此示例输出中的插值以获取额外值 十进制精度。