Question

Pandas DataFrame可以像这样转换为hdf5文件;

df.to_hdf('test_store.hdf','test',mode='w')

我有一个sqlite db文件，必须转换为hdf5文件，然后我会使用pd.read_hdf通过pandas读取hdf5文件。

但首先我如何将python sqlite db转换为hdf5文件？

编辑：

我知道在pandas中使用.read_sql方法。但我想首先将db转换为hdf5。

Answer 1

这非常简单：使用熊猫！

pandas支持reading data directly from a SQL database到DataFrame中。一旦你获得了DataFrame，就可以随心所欲地使用它。

简短的例子，取from the docs：

import sqlite3
from pandas.io import sql
# Create your connection.
cnx = sqlite3.connect('mydbfile.sqlite')

# read the result of the SQL query into a DataFrame
data = sql.read_sql("SELECT * FROM data;", cnx)

# now you can write it into a HDF5 file
data.to_hdf('test_store.hdf','test',mode='w')

Answer 2

看看这个---

http://www.tutorialspoint.com/sqlite/sqlite_limit_clause.htm

我们的想法是迭代select * from table查询，并以越来越大的偏移量限制结果。将结果写入hdf5数据存储，如上所示。首先使用select count(*) from table计算条目数，然后将迭代拆分为可管理的块。例如，如果有400万条记录一次读取200,000条记录，并从0,200000,400000等处增加了... ...

我需要对一个非常大的sqlite文件执行此操作。将报告是否有效。

将python sqlite db转换为hdf5

2 个答案: