Question

是否有python pandas的实现将数据缓存在磁盘上，所以我可以避免每次重现它？

特别是对于get_yahoo_data来说，有没有一种财务缓存方法？

一个很好的优点是：

需要编写的几行代码
为相同来源下载新数据时，可以整合持久化系列

Answer 1

有很多方法可以实现这一点，但是最简单的方法可能是使用内置方法来编写和读取Python pickles。您可以使用pandas.DataFrame.to_pickle将DataFrame存储到磁盘，并使用pandas.read_pickle从磁盘读取存储的DataFrame。

pandas.DataFrame的示例：

# Store your DataFrame
df.to_pickle('cached_dataframe.pkl') # will be stored in current directory

# Read your DataFrame
df = pandas.read_pickle('cached_dataframe.pkl') # read from current directory

相同的方法也适用于pandas.Series：

# Store your Series
series.to_pickle('cached_series.pkl') # will be stored in current directory

# Read your DataFrame
series = pandas.read_pickle('cached_series.pkl') # read from current directory

Answer 2

根据不同的要求，有a dozen of methods来回执行CSV，Excel，JSON，Python Pickle Format，HDF5甚至SQL with DB等。

就代码行而言，to/read中的许多格式对于每个方向来说只是一行代码。 Python和Pandas已经使代码尽可能简洁，因此您不必为此担心。

我认为没有一个解决方案可以满足所有需求，具体情况视情况而定：

用于保存数据的人类可读性：CSV，Excel
用于二进制python对象序列化（use-cases）：Pickle
用于数据交换：JSON
用于长期和增量更新：SQL
等

如果您想每天更新股票价格并供以后使用，我更喜欢Pandas with SQL Queries，当然，这会添加几行代码来建立数据库连接：

from sqlalchemy import create_engine

new_data = getting_daily_price()
# You can also choose other db drivers instead of `sqlalchemy`
engine = create_engine('sqlite:///:memory:')
with engine.connect() as conn:
    new_data.to_sql('table_name', conn) # To Write
    df = pd.read_sql_table('sql_query', conn) # To Read

Answer 3

您可以使用pandas cacher软件包。

from pandas_cacher import pandas_cache 

@pandas_cache
def foo():
    ...

Python熊猫持久缓存

3 个答案: