如何将sqlite查询数据提供给Pandas scatter_matrix

时间:2014-08-01 01:35:29

标签: python pandas

我使用Python sqlite3成功从Fitbit sqlite db中提取数据,如下所示。我想在数据上创建Pandas scatter_matrix。

我成功获取数据的代码是:

import pandas.io.sql as psql
import sqlite3 as lite
from pandas.tools.plotting import scatter_matrix

con = lite.connect('C:/temp/fitbit-db')

sql = ('SELECT log_date,'
       'duration,'
       'minutes_to_fall_asleep,
       'minutes_asleep,' 
       'minutes_awake,'
       'minutes_after_wakeup,'
       'awakenings_count,'
       'time_in_bed,'
       'awake_count,'
       'efficiency,'
       'restless_count '
       'FROM sleep_log_entry')

cur.execute(sql)

我可以使用以下方法打印出查询结果:

fitbit_data_fetchall = cur.fetchall()
for row in fitbit_data_fetchall:
    print row

哪个看起来像这样的行:

(1397426400000L, 6420000, 8, 99, 0, 0, 0, 107, 0, 100.0, 0)
(1397944800000L, 23940000, 11, 370, 18, 0, 7, 399, 1, 95.0, 8)
(1399759200000L, 28200000, 13, 448, 9, 0, 2, 470, 0, 98.0, 2)
 etc ....

但是,不是仅仅打印行,而是使用以下方法将查询结果读入数组:

fitbit_data_psql = psql.read_sql(sql, con)

我使用Pandas scatter_matrix这个数组来尝试创建散点图矩阵,但它不起作用。我尝试了一些变体,例如:

scatter_matrix(fitbit_data_psql, alpha=0.2, figsize=(6, 6), diagonal='kde')

似乎没有错误,但只获得了121行以下但没有图表。运行需要一段时间,所以可能会超时?

array([[<matplotlib.axes.AxesSubplot object at 0x0000000036798208>,
        <matplotlib.axes.AxesSubplot object at 0x000000003681B6A0>,
        <matplotlib.axes.AxesSubplot object at 0x000000003690BC50>,
        <matplotlib.axes.AxesSubplot object at 0x0000000036A2DA20>,
        <matplotlib.axes.AxesSubplot object at 0x0000000036947EF0>,
        <matplotlib.axes.AxesSubplot object at 0x0000000036B88D68>,
        <matplotlib.axes.AxesSubplot object at 0x0000000036C89710>,
         etc ...
         etc ...
        <matplotlib.axes.AxesSubplot object at 0x00000000520FB978>,
        <matplotlib.axes.AxesSubplot object at 0x00000000521E49E8>]], dtype=object)

我尝试使用数组中的一些列,如下所示:

scatter_matrix(fitbit_data_psql['activity', 'awake', 'asleep'], alpha=0.2, figsize=(6, 6), diagonal='kde')

但是这会出现以下错误,看起来它不能识别列?

KeyError                                  Traceback (most recent call last)
<ipython-input-24-b0afbb6671fc> in <module>()
     29 #scatter_matrix(fitbit_data, alpha=0.2, figsize=(6, 6), diagonal='kde')
     30 #scatter_matrix(fitbit_data[['activity', 'awake', 'asleep']], figsize=(14, 10))
---> 31 scatter_matrix(fitbit_data['activity', 'awake', 'asleep'], alpha=0.2, figsize=(6, 6), diagonal='kde')

C:\Users\bb\Anaconda\lib\site-packages\pandas\core\frame.pyc in __getitem__(self, key)
   1682             return self._getitem_multilevel(key)
   1683         else:
-> 1684             return self._getitem_column(key)
   1685 
   1686     def _getitem_column(self, key):

C:\Users\bb\Anaconda\lib\site-packages\pandas\core\frame.pyc in _getitem_column(self, key)
   1689         # get column
   1690         if self.columns.is_unique:
-> 1691             return self._get_item_cache(key)
   1692 
   1693         # duplicate columns & possible reduce dimensionaility

C:\Users\bb\Anaconda\lib\site-packages\pandas\core\generic.pyc in _get_item_cache(self, item)
   1050         res = cache.get(item)
   1051         if res is None:
-> 1052             values = self._data.get(item)
   1053             res = self._box_item_values(item, values)
   1054             cache[item] = res

C:\Users\bb\Anaconda\lib\site-packages\pandas\core\internals.pyc in get(self, item)
   2535 
   2536             if not isnull(item):
-> 2537                 loc = self.items.get_loc(item)
   2538             else:
   2539                 indexer = np.arange(len(self.items))[isnull(self.items)]

C:\Users\bb\Anaconda\lib\site-packages\pandas\core\index.pyc in get_loc(self, key)
   1154         loc : int if unique index, possibly slice or mask if not
   1155         """
-> 1156         return self._engine.get_loc(_values_from_object(key))
   1157 
   1158     def get_value(self, series, key):

C:\Users\bb\Anaconda\lib\site-packages\pandas\index.pyd in pandas.index.IndexEngine.get_loc (pandas\index.c:3650)()

C:\Users\bb\Anaconda\lib\site-packages\pandas\index.pyd in pandas.index.IndexEngine.get_loc (pandas\index.c:3528)()

C:\Users\bb\Anaconda\lib\site-packages\pandas\hashtable.pyd in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:11908)()

C:\Users\bb\Anaconda\lib\site-packages\pandas\hashtable.pyd in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:11861)()

KeyError: ('activity', 'awake', 'asleep')

对于我拥有的数组,scatter_matrix的正确用法是什么?

更新了问题:

  1. 我刚刚意识到查询结果没有标题行,所以这可能就是为什么scatter_matrix不能正常工作。 scatter_matrix是否与相对列号一起使用?

1 个答案:

答案 0 :(得分:0)

看起来像pandas.io.sql read_sql有一些额外的参数来获取列标题。我从

更改了read_sql语句
fitbit_data_psql = psql.read_sql(sql, con)

fitbit_data_psql = psql.read_sql(sql, con, index_col=None, coerce_float=True)

现在,scatter_matrix图与列名一起显示为数据标签。