ValueError:如果使用所有标量值,则必须传递索引

时间:2016-10-10 20:50:51

标签: python pandas quantitative-finance

我有以下代码:

import datetime
import MySQLdb as mdb
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import pandas as pd
import pprint
import statsmodels.tsa.stattools as ts

from pandas.stats.api import ols

if __name__ == "__main__":
    # Connect to the MySQL instance
    db_host2 = '192.168.200.128'
    db_user2 = 'sec_master'
    db_pass2 = 'PASS'
    db_name2 = 'EUR-USD'
    con = mdb.connect(db_host2, db_user2, db_pass2, db_name2)

    sql2 = """SELECT `TIME`.`BID-CLOSE`
              FROM `EUR-USD`.`tbl_EUR-USD_1-Day`
              WHERE TIME >= '2006-12-15 22:00:00' AND TIME <= '2007-01-03 22:00:00'
              ORDER BY TIME ASC;"""
    # Create a pandas dataframe from the SQL query
    eurusd = pd.read_sql_query(sql2, con=con, index_col='TIME')

idx = pd.date_range('2006-12-17 22:00:00', '2007-01-03 22:00:00')

eurusd.reindex(idx, fill_value=None)

这给出了

的输出
                     BID-CLOSE
2006-12-17 22:00:00    1.30971
2006-12-18 22:00:00    1.31971
2006-12-19 22:00:00    1.31721
2006-12-20 22:00:00    1.31771
2006-12-21 22:00:00    1.31411
2006-12-22 22:00:00        NaN
2006-12-23 22:00:00        NaN
2006-12-24 22:00:00        NaN
2006-12-25 22:00:00    1.30971
2006-12-26 22:00:00    1.31131
2006-12-27 22:00:00    1.31491
2006-12-28 22:00:00    1.32021
2006-12-29 22:00:00        NaN
2006-12-30 22:00:00        NaN
2006-12-31 22:00:00    1.32731
2007-01-01 22:00:00    1.32731
2007-01-02 22:00:00    1.31701
2007-01-03 22:00:00    1.30831

然后我分配:

eurusd = eurusd.reindex(idx, fill_value=None)

接下来我使用:

methods = ['linear', 'quadratic', 'cubic']

当我使用下一行时出错。

pd.DataFrame({m: eurusd.interpolate(method=m) for m in methods})

错误是:

ValueError: If using all scalar values, you must pass an index

我正在关注(尝试)本指南的插值部分http://pandas.pydata.org/pandas-docs/stable/missing_data.html

如何正确传递索引&#39;在这种情况下?

更新1

eurusd.interpolate('linear')

的输出
                     BID-CLOSE
2006-12-17 22:00:00   1.309710
2006-12-18 22:00:00   1.319710
2006-12-19 22:00:00   1.317210
2006-12-20 22:00:00   1.317710
2006-12-21 22:00:00   1.314110
2006-12-22 22:00:00   1.313010
2006-12-23 22:00:00   1.311910
2006-12-24 22:00:00   1.310810
2006-12-25 22:00:00   1.309710
2006-12-26 22:00:00   1.311310
2006-12-27 22:00:00   1.314910
2006-12-28 22:00:00   1.320210
2006-12-29 22:00:00   1.322577
2006-12-30 22:00:00   1.324943
2006-12-31 22:00:00   1.327310
2007-01-01 22:00:00   1.327310
2007-01-02 22:00:00   1.317010
2007-01-03 22:00:00   1.308310

更新2

In[9]: pd.DataFrame({m: eurusd['BID-CLOSE'].interpolate(method=m) for m in methods})
Out[9]: 
                        cubic    linear  quadratic
2006-12-17 22:00:00  1.309710  1.309710   1.309710
2006-12-18 22:00:00  1.319710  1.319710   1.319710
2006-12-19 22:00:00  1.317210  1.317210   1.317210
2006-12-20 22:00:00  1.317710  1.317710   1.317710
2006-12-21 22:00:00  1.314110  1.314110   1.314110
2006-12-22 22:00:00  1.310762  1.313010   1.307947
2006-12-23 22:00:00  1.309191  1.311910   1.305159
2006-12-24 22:00:00  1.308980  1.310810   1.305747
2006-12-25 22:00:00  1.309710  1.309710   1.309710
2006-12-26 22:00:00  1.311310  1.311310   1.311310
2006-12-27 22:00:00  1.314910  1.314910   1.314910
2006-12-28 22:00:00  1.320210  1.320210   1.320210
2006-12-29 22:00:00  1.323674  1.322577   1.321632
2006-12-30 22:00:00  1.325553  1.324943   1.323998
2006-12-31 22:00:00  1.327310  1.327310   1.327310
2007-01-01 22:00:00  1.327310  1.327310   1.327310
2007-01-02 22:00:00  1.317010  1.317010   1.317010
2007-01-03 22:00:00  1.308310  1.308310   1.308310

1 个答案:

答案 0 :(得分:3)

问题是当你使用DataFrame构造函数时:

pd.DataFrame({m: eurusd.interpolate(method=m) for m in methods})

每个m的值是DataFrame,它将被解释为标量值,这无疑令人困惑。这个构造函数需要某种序列或Series。以下应解决问题:

pd.DataFrame({m: eurusd['BID-CLOSE'].interpolate(method=m) for m in methods})

由于列上的子集返回Series。所以,例如,而不是:

In [34]: pd.DataFrame({'linear':df.interpolate('linear')})
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-34-4b6c095c6da3> in <module>()
----> 1 pd.DataFrame({'linear':df.interpolate('linear')})

/home/juan/anaconda3/lib/python3.5/site-packages/pandas/core/frame.py in __init__(self, data, index, columns, dtype, copy)
    222                                  dtype=dtype, copy=copy)
    223         elif isinstance(data, dict):
--> 224             mgr = self._init_dict(data, index, columns, dtype=dtype)
    225         elif isinstance(data, ma.MaskedArray):
    226             import numpy.ma.mrecords as mrecords

/home/juan/anaconda3/lib/python3.5/site-packages/pandas/core/frame.py in _init_dict(self, data, index, columns, dtype)
    358             arrays = [data[k] for k in keys]
    359 
--> 360         return _arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
    361 
    362     def _init_ndarray(self, values, index, columns, dtype=None, copy=False):

/home/juan/anaconda3/lib/python3.5/site-packages/pandas/core/frame.py in _arrays_to_mgr(arrays, arr_names, index, columns, dtype)
   5229     # figure out the index, if necessary
   5230     if index is None:
-> 5231         index = extract_index(arrays)
   5232     else:
   5233         index = _ensure_index(index)

/home/juan/anaconda3/lib/python3.5/site-packages/pandas/core/frame.py in extract_index(data)
   5268 
   5269         if not indexes and not raw_lengths:
-> 5270             raise ValueError('If using all scalar values, you must pass'
   5271                              ' an index')
   5272 

ValueError: If using all scalar values, you must pass an index

请改用:

In [35]: pd.DataFrame({'linear':df['BID-CLOSE'].interpolate('linear')})
Out[35]: 
                       linear
timestamp                    
2016-10-10 22:00:00  1.309710
2016-10-10 22:00:00  1.319710
2016-10-10 22:00:00  1.317210
2016-10-10 22:00:00  1.317710
2016-10-10 22:00:00  1.314110
2016-10-10 22:00:00  1.313010
2016-10-10 22:00:00  1.311910
2016-10-10 22:00:00  1.310810
2016-10-10 22:00:00  1.309710
2016-10-10 22:00:00  1.311310
2016-10-10 22:00:00  1.314910
2016-10-10 22:00:00  1.320210
2016-10-10 22:00:00  1.322577
2016-10-10 22:00:00  1.324943
2016-10-10 22:00:00  1.327310
2016-10-10 22:00:00  1.327310
2016-10-10 22:00:00  1.317010
2016-10-10 22:00:00  1.308310

公平警告,当我对您的数据进行LinAlgError: singular matrix'quadratic'插值时,我收到'cubic'错误。不知道为什么会这样。