我有以下代码:
import datetime
import MySQLdb as mdb
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import pandas as pd
import pprint
import statsmodels.tsa.stattools as ts
from pandas.stats.api import ols
if __name__ == "__main__":
# Connect to the MySQL instance
db_host2 = '192.168.200.128'
db_user2 = 'sec_master'
db_pass2 = 'PASS'
db_name2 = 'EUR-USD'
con = mdb.connect(db_host2, db_user2, db_pass2, db_name2)
sql2 = """SELECT `TIME`.`BID-CLOSE`
FROM `EUR-USD`.`tbl_EUR-USD_1-Day`
WHERE TIME >= '2006-12-15 22:00:00' AND TIME <= '2007-01-03 22:00:00'
ORDER BY TIME ASC;"""
# Create a pandas dataframe from the SQL query
eurusd = pd.read_sql_query(sql2, con=con, index_col='TIME')
idx = pd.date_range('2006-12-17 22:00:00', '2007-01-03 22:00:00')
eurusd.reindex(idx, fill_value=None)
这给出了
的输出 BID-CLOSE
2006-12-17 22:00:00 1.30971
2006-12-18 22:00:00 1.31971
2006-12-19 22:00:00 1.31721
2006-12-20 22:00:00 1.31771
2006-12-21 22:00:00 1.31411
2006-12-22 22:00:00 NaN
2006-12-23 22:00:00 NaN
2006-12-24 22:00:00 NaN
2006-12-25 22:00:00 1.30971
2006-12-26 22:00:00 1.31131
2006-12-27 22:00:00 1.31491
2006-12-28 22:00:00 1.32021
2006-12-29 22:00:00 NaN
2006-12-30 22:00:00 NaN
2006-12-31 22:00:00 1.32731
2007-01-01 22:00:00 1.32731
2007-01-02 22:00:00 1.31701
2007-01-03 22:00:00 1.30831
然后我分配:
eurusd = eurusd.reindex(idx, fill_value=None)
接下来我使用:
methods = ['linear', 'quadratic', 'cubic']
当我使用下一行时出错。
pd.DataFrame({m: eurusd.interpolate(method=m) for m in methods})
错误是:
ValueError: If using all scalar values, you must pass an index
我正在关注(尝试)本指南的插值部分http://pandas.pydata.org/pandas-docs/stable/missing_data.html
如何正确传递索引&#39;在这种情况下?
更新1
eurusd.interpolate('linear')
BID-CLOSE
2006-12-17 22:00:00 1.309710
2006-12-18 22:00:00 1.319710
2006-12-19 22:00:00 1.317210
2006-12-20 22:00:00 1.317710
2006-12-21 22:00:00 1.314110
2006-12-22 22:00:00 1.313010
2006-12-23 22:00:00 1.311910
2006-12-24 22:00:00 1.310810
2006-12-25 22:00:00 1.309710
2006-12-26 22:00:00 1.311310
2006-12-27 22:00:00 1.314910
2006-12-28 22:00:00 1.320210
2006-12-29 22:00:00 1.322577
2006-12-30 22:00:00 1.324943
2006-12-31 22:00:00 1.327310
2007-01-01 22:00:00 1.327310
2007-01-02 22:00:00 1.317010
2007-01-03 22:00:00 1.308310
更新2
In[9]: pd.DataFrame({m: eurusd['BID-CLOSE'].interpolate(method=m) for m in methods})
Out[9]:
cubic linear quadratic
2006-12-17 22:00:00 1.309710 1.309710 1.309710
2006-12-18 22:00:00 1.319710 1.319710 1.319710
2006-12-19 22:00:00 1.317210 1.317210 1.317210
2006-12-20 22:00:00 1.317710 1.317710 1.317710
2006-12-21 22:00:00 1.314110 1.314110 1.314110
2006-12-22 22:00:00 1.310762 1.313010 1.307947
2006-12-23 22:00:00 1.309191 1.311910 1.305159
2006-12-24 22:00:00 1.308980 1.310810 1.305747
2006-12-25 22:00:00 1.309710 1.309710 1.309710
2006-12-26 22:00:00 1.311310 1.311310 1.311310
2006-12-27 22:00:00 1.314910 1.314910 1.314910
2006-12-28 22:00:00 1.320210 1.320210 1.320210
2006-12-29 22:00:00 1.323674 1.322577 1.321632
2006-12-30 22:00:00 1.325553 1.324943 1.323998
2006-12-31 22:00:00 1.327310 1.327310 1.327310
2007-01-01 22:00:00 1.327310 1.327310 1.327310
2007-01-02 22:00:00 1.317010 1.317010 1.317010
2007-01-03 22:00:00 1.308310 1.308310 1.308310
答案 0 :(得分:3)
问题是当你使用DataFrame
构造函数时:
pd.DataFrame({m: eurusd.interpolate(method=m) for m in methods})
每个m
的值是DataFrame
,它将被解释为标量值,这无疑令人困惑。这个构造函数需要某种序列或Series
。以下应解决问题:
pd.DataFrame({m: eurusd['BID-CLOSE'].interpolate(method=m) for m in methods})
由于列上的子集返回Series
。所以,例如,而不是:
In [34]: pd.DataFrame({'linear':df.interpolate('linear')})
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-34-4b6c095c6da3> in <module>()
----> 1 pd.DataFrame({'linear':df.interpolate('linear')})
/home/juan/anaconda3/lib/python3.5/site-packages/pandas/core/frame.py in __init__(self, data, index, columns, dtype, copy)
222 dtype=dtype, copy=copy)
223 elif isinstance(data, dict):
--> 224 mgr = self._init_dict(data, index, columns, dtype=dtype)
225 elif isinstance(data, ma.MaskedArray):
226 import numpy.ma.mrecords as mrecords
/home/juan/anaconda3/lib/python3.5/site-packages/pandas/core/frame.py in _init_dict(self, data, index, columns, dtype)
358 arrays = [data[k] for k in keys]
359
--> 360 return _arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
361
362 def _init_ndarray(self, values, index, columns, dtype=None, copy=False):
/home/juan/anaconda3/lib/python3.5/site-packages/pandas/core/frame.py in _arrays_to_mgr(arrays, arr_names, index, columns, dtype)
5229 # figure out the index, if necessary
5230 if index is None:
-> 5231 index = extract_index(arrays)
5232 else:
5233 index = _ensure_index(index)
/home/juan/anaconda3/lib/python3.5/site-packages/pandas/core/frame.py in extract_index(data)
5268
5269 if not indexes and not raw_lengths:
-> 5270 raise ValueError('If using all scalar values, you must pass'
5271 ' an index')
5272
ValueError: If using all scalar values, you must pass an index
请改用:
In [35]: pd.DataFrame({'linear':df['BID-CLOSE'].interpolate('linear')})
Out[35]:
linear
timestamp
2016-10-10 22:00:00 1.309710
2016-10-10 22:00:00 1.319710
2016-10-10 22:00:00 1.317210
2016-10-10 22:00:00 1.317710
2016-10-10 22:00:00 1.314110
2016-10-10 22:00:00 1.313010
2016-10-10 22:00:00 1.311910
2016-10-10 22:00:00 1.310810
2016-10-10 22:00:00 1.309710
2016-10-10 22:00:00 1.311310
2016-10-10 22:00:00 1.314910
2016-10-10 22:00:00 1.320210
2016-10-10 22:00:00 1.322577
2016-10-10 22:00:00 1.324943
2016-10-10 22:00:00 1.327310
2016-10-10 22:00:00 1.327310
2016-10-10 22:00:00 1.317010
2016-10-10 22:00:00 1.308310
公平警告,当我对您的数据进行LinAlgError: singular matrix
和'quadratic'
插值时,我收到'cubic'
错误。不知道为什么会这样。