我有一个数据框列表,
df1 =
Stock Year Profit CountPercent
AAPL 2012 1 38.77
AAPL 2013 1 33.33
df2 =
Stock Year Profit CountPercent
GOOG 2012 1 43.47
GOOG 2013 1 32.35
df3 =
Stock Year Profit CountPercent
ABC 2012 1 40.00
ABC 2013 1 32.35
函数的输出是[df1,df2,df3,......
],就像那样,
数据框中的所有列都相同,但行将不同,
我如何将这些存储在硬盘中并以最快速有效的方式再次检索列表?
答案 0 :(得分:1)
如果列Stock
中的值相同,您可以按iloc
删除此列并使用dict comprehension
(键是每个{{1中的列Stock
的第一个值}}):
df
要存储在dfs = {df.ix[0,'Stock']: df.iloc[:, 1:] for df in [df1,df2,df3]}
print (dfs['AAPL'])
Year Profit CountPercent
0 2012 1 38.77
1 2013 1 33.33
print (dfs['ABC'])
Year Profit CountPercent
0 2012 1 40.00
1 2013 1 32.35
print (dfs['GOOG'])
Year Profit CountPercent
0 2012 1 43.47
1 2013 1 32.35
中,我认为最好使用hdf5 pytables。
如果每个disk
列中的值相同,您可以concat
全部Stack
然后存储它:
df
答案 1 :(得分:1)
我认为如果你的所有DF都具有相同的形状,那么将数据存储为pandas.Panel
而不是DF列表会更自然 - 这就是pandas_datareader工作的方式
import io
import pandas as pd
df1 = pd.read_csv(io.StringIO("""
Stock,Year,Profit,CountPercent
AAPL,2012,1,38.77
AAPL,2013,1,33.33
"""
))
df2 = pd.read_csv(io.StringIO("""
Stock,Year,Profit,CountPercent
GOOG,2012,1,43.47
GOOG,2013,1,32.35
"""
))
df3 = pd.read_csv(io.StringIO("""
Stock,Year,Profit,CountPercent
ABC,2012,1,40.0
ABC,2013,1,32.35
"""
))
store = pd.HDFStore('c:/temp/stocks.h5')
# i had to drop `Stock` column and make it Panel-Axis, because of ERROR:
# TypeError: Cannot serialize the column [%s] because its data contents are [mixed-integer] object dtype
# when saving Panel to HDFStore ...
p = pd.Panel({df.iat[0, 0]:df.drop('Stock', 1) for df in [df1,df2,df3]})
store = pd.HDFStore('c:/temp/stocks.h5')
store.append('stocks', p, data_columns=True, mode='w')
store.close()
# read panel from HDFStore
store = pd.HDFStore('c:/temp/stocks.h5')
p = store.select('stocks')
商店:
In [18]: store
Out[18]:
<class 'pandas.io.pytables.HDFStore'>
File path: c:/temp/stocks.h5
/stocks wide_table (typ->appendable,nrows->6,ncols->3,indexers->[major_axis,minor_axis],dc->[AAPL,ABC,GOOG])
面板尺寸:
In [19]: p['AAPL']
Out[19]:
Year Profit CountPercent
0 2012.0 1.0 38.77
1 2013.0 1.0 33.33
In [20]: p[:, :, 'Profit']
Out[20]:
AAPL ABC GOOG
0 1.0 1.0 1.0
1 1.0 1.0 1.0
In [21]: p[:, 0]
Out[21]:
AAPL ABC GOOG
Year 2012.00 2012.0 2012.00
Profit 1.00 1.0 1.00
CountPercent 38.77 40.0 43.47