Question

我在大熊猫DataFrame中提供了从雅虎提取的股票市场数据列表（参见下面的格式）。该日期用作DataFrame中的索引。我想将数据（包括索引）写入SQLite数据库。

             AAPL     GE
Date
2009-01-02  89.95  14.76
2009-01-05  93.75  14.38
2009-01-06  92.20  14.58
2009-01-07  90.21  13.93
2009-01-08  91.88  13.95

根据我对Pandas的write_frame代码的读取，它does not currently support writing the index。我尝试使用to_records，但遇到了issue with Numpy 1.6.2 and datetimes。现在我正在尝试使用.itertuples编写元组，但SQLite抛出了一个错误，表明数据类型不受支持（参见下面的代码和结果）。我对Python，Pandas和Numpy比较陌生，所以我完全有可能错过一些明显的东西。我想我在尝试为SQLite写一个日期时间时遇到了问题，但我想我可能会过于复杂。

我认为我可能能够通过升级到Numpy 1.7或Pandas的开发版本来解决问题，Pandas在GitHub上发布了修复程序。我更愿意使用软件的发布版本开发 - 我是新手，我不希望稳定性问题进一步混淆。

有没有办法使用Python 2.7.2，Pandas 0.10.0和Numpy 1.6.2来实现这一目标？也许以某种方式清理日期时间？我有点过头了，任何帮助都会受到赞赏。

代码：

import numpy as np
import pandas as pd
from pandas import DataFrame, Series
import sqlite3 as db

# download data from yahoo
all_data = {}

for ticker in ['AAPL', 'GE']:
    all_data[ticker] = pd.io.data.get_data_yahoo(ticker, '1/1/2009','12/31/2012')

# create a data frame
price = DataFrame({tic: data['Adj Close'] for tic, data in all_data.iteritems()})

# get output ready for database export
output = price.itertuples()
data = tuple(output)

# connect to a test DB with one three-column table titled "Demo"
con = db.connect('c:/Python27/test.db')
wildcards = ','.join(['?'] * 3)
insert_sql = 'INSERT INTO Demo VALUES (%s)' % wildcards
con.executemany(insert_sql, data)

结果：

---------------------------------------------------------------------------
InterfaceError                            Traceback (most recent call last)
<ipython-input-15-680cc9889c56> in <module>()
----> 1 con.executemany(insert_sql, data)

InterfaceError: Error binding parameter 0 - probably unsupported type.

Answer 1

在最近的pandas中，索引将保存在数据库中（您以前必须先reset_index）。

在docs之后（在内存中设置SQLite连接）：

import sqlite3
# Create your connection.
cnx = sqlite3.connect(':memory:')

注意：您也可以在此处传递SQLAlchemy引擎（请参阅答案结尾）。

我们可以将price2保存到cnx：

price2.to_sql(name='price2', con=cnx)

我们可以通过read_sql检索：

p2 = pd.read_sql('select * from price2', cnx)

但是，当存储（和检索）日期为unicode 而不是Timestamp时。要转换回我们开始使用的内容，我们可以使用pd.to_datetime：

p2.Date = pd.to_datetime(p2.Date)
p = p2.set_index('Date')

我们返回与prices相同的DataFrame：

In [11]: p2
Out[11]: 
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 1006 entries, 2009-01-02 00:00:00 to 2012-12-31 00:00:00
Data columns:
AAPL    1006  non-null values
GE      1006  non-null values
dtypes: float64(2)

您还可以使用SQLAlchemy engine：

from sqlalchemy import create_engine
e = create_engine('sqlite://')  # pass your db url

price2.to_sql(name='price2', con=cnx)

这允许您使用read_sql_table（只能与SQLAlchemy一起使用）：

pd.read_sql_table(table_name='price2', con=e)
#         Date   AAPL     GE
# 0 2009-01-02  89.95  14.76
# 1 2009-01-05  93.75  14.38
# 2 2009-01-06  92.20  14.58
# 3 2009-01-07  90.21  13.93
# 4 2009-01-08  91.88  13.95

Answer 2

不幸的是，在最新版本的Pandas中，pandas.io.write_frame已不再存在当前接受的答案。例如，我使用的是pandas 0.19.2。你可以做点什么

from sqlalchemy import create_engine

disk_engine = create_engine('sqlite:///my_lite_store.db')
price.to_sql('stock_price', disk_engine, if_exists='append')

然后依次使用以下内容预览您的表格：

df = pd.read_sql_query('SELECT * FROM stock_price LIMIT 3',disk_engine)
df.head()

Answer 3

下面是对我有用的代码。我能够将其写入SQLite DB。

import pandas as pd
import sqlite3 as sq
data = <This is going to be your pandas dataframe>
sql_data = 'D:\\SA.sqlite' #- Creates DB names SQLite
conn = sq.connect(sql_data)
cur = conn.cursor()
cur.execute('''DROP TABLE IF EXISTS SA''')
data.to_sql('SA', conn, if_exists='replace', index=False) # - writes the pd.df to SQLIte DB
pd.read_sql('select * from SentimentAnalysis', conn)
conn.commit()
conn.close()

如何使用Index将Pandas数据帧写入sqlite

3 个答案: