我使用Python sqlite3成功从Fitbit sqlite db中提取数据,如下所示。我想在数据上创建Pandas scatter_matrix。
我成功获取数据的代码是:
import pandas.io.sql as psql
import sqlite3 as lite
from pandas.tools.plotting import scatter_matrix
con = lite.connect('C:/temp/fitbit-db')
sql = ('SELECT log_date,'
'duration,'
'minutes_to_fall_asleep,
'minutes_asleep,'
'minutes_awake,'
'minutes_after_wakeup,'
'awakenings_count,'
'time_in_bed,'
'awake_count,'
'efficiency,'
'restless_count '
'FROM sleep_log_entry')
cur.execute(sql)
我可以使用以下方法打印出查询结果:
fitbit_data_fetchall = cur.fetchall()
for row in fitbit_data_fetchall:
print row
哪个看起来像这样的行:
(1397426400000L, 6420000, 8, 99, 0, 0, 0, 107, 0, 100.0, 0)
(1397944800000L, 23940000, 11, 370, 18, 0, 7, 399, 1, 95.0, 8)
(1399759200000L, 28200000, 13, 448, 9, 0, 2, 470, 0, 98.0, 2)
etc ....
但是,不是仅仅打印行,而是使用以下方法将查询结果读入数组:
fitbit_data_psql = psql.read_sql(sql, con)
我使用Pandas scatter_matrix这个数组来尝试创建散点图矩阵,但它不起作用。我尝试了一些变体,例如:
scatter_matrix(fitbit_data_psql, alpha=0.2, figsize=(6, 6), diagonal='kde')
似乎没有错误,但只获得了121行以下但没有图表。运行需要一段时间,所以可能会超时?
array([[<matplotlib.axes.AxesSubplot object at 0x0000000036798208>,
<matplotlib.axes.AxesSubplot object at 0x000000003681B6A0>,
<matplotlib.axes.AxesSubplot object at 0x000000003690BC50>,
<matplotlib.axes.AxesSubplot object at 0x0000000036A2DA20>,
<matplotlib.axes.AxesSubplot object at 0x0000000036947EF0>,
<matplotlib.axes.AxesSubplot object at 0x0000000036B88D68>,
<matplotlib.axes.AxesSubplot object at 0x0000000036C89710>,
etc ...
etc ...
<matplotlib.axes.AxesSubplot object at 0x00000000520FB978>,
<matplotlib.axes.AxesSubplot object at 0x00000000521E49E8>]], dtype=object)
我尝试使用数组中的一些列,如下所示:
scatter_matrix(fitbit_data_psql['activity', 'awake', 'asleep'], alpha=0.2, figsize=(6, 6), diagonal='kde')
但是这会出现以下错误,看起来它不能识别列?
KeyError Traceback (most recent call last)
<ipython-input-24-b0afbb6671fc> in <module>()
29 #scatter_matrix(fitbit_data, alpha=0.2, figsize=(6, 6), diagonal='kde')
30 #scatter_matrix(fitbit_data[['activity', 'awake', 'asleep']], figsize=(14, 10))
---> 31 scatter_matrix(fitbit_data['activity', 'awake', 'asleep'], alpha=0.2, figsize=(6, 6), diagonal='kde')
C:\Users\bb\Anaconda\lib\site-packages\pandas\core\frame.pyc in __getitem__(self, key)
1682 return self._getitem_multilevel(key)
1683 else:
-> 1684 return self._getitem_column(key)
1685
1686 def _getitem_column(self, key):
C:\Users\bb\Anaconda\lib\site-packages\pandas\core\frame.pyc in _getitem_column(self, key)
1689 # get column
1690 if self.columns.is_unique:
-> 1691 return self._get_item_cache(key)
1692
1693 # duplicate columns & possible reduce dimensionaility
C:\Users\bb\Anaconda\lib\site-packages\pandas\core\generic.pyc in _get_item_cache(self, item)
1050 res = cache.get(item)
1051 if res is None:
-> 1052 values = self._data.get(item)
1053 res = self._box_item_values(item, values)
1054 cache[item] = res
C:\Users\bb\Anaconda\lib\site-packages\pandas\core\internals.pyc in get(self, item)
2535
2536 if not isnull(item):
-> 2537 loc = self.items.get_loc(item)
2538 else:
2539 indexer = np.arange(len(self.items))[isnull(self.items)]
C:\Users\bb\Anaconda\lib\site-packages\pandas\core\index.pyc in get_loc(self, key)
1154 loc : int if unique index, possibly slice or mask if not
1155 """
-> 1156 return self._engine.get_loc(_values_from_object(key))
1157
1158 def get_value(self, series, key):
C:\Users\bb\Anaconda\lib\site-packages\pandas\index.pyd in pandas.index.IndexEngine.get_loc (pandas\index.c:3650)()
C:\Users\bb\Anaconda\lib\site-packages\pandas\index.pyd in pandas.index.IndexEngine.get_loc (pandas\index.c:3528)()
C:\Users\bb\Anaconda\lib\site-packages\pandas\hashtable.pyd in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:11908)()
C:\Users\bb\Anaconda\lib\site-packages\pandas\hashtable.pyd in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:11861)()
KeyError: ('activity', 'awake', 'asleep')
对于我拥有的数组,scatter_matrix的正确用法是什么?
更新了问题:
答案 0 :(得分:0)
看起来像pandas.io.sql read_sql有一些额外的参数来获取列标题。我从
更改了read_sql语句fitbit_data_psql = psql.read_sql(sql, con)
到
fitbit_data_psql = psql.read_sql(sql, con, index_col=None, coerce_float=True)
现在,scatter_matrix图与列名一起显示为数据标签。