我正在通过熊猫加载文件以进行大块处理:
import pandas as pd
import numpy as np
f= open("analysis.txt","a+")
chunksize = 10 ** 6
for chunk in pd.read_csv('filename.txt', sep='\t', lineterminator='\r', chunksize=chunksize):
my_tab = pd.crosstab(index=chunk["Year"], columns=chunk["Indicator"])
my_tab给出当前块的Year和Indicator列的数据框。有没有一种方法可以汇总所有这些数据帧,以便在处理完所有数据之后,可以查看整个数据文件的最终分析结果?
答案 0 :(得分:0)
简单的例子:
import pandas as pd
df = pd.DataFrame({'type':['fruit','vegi','fruit','meat','vegi','meat','fruit'],
'ori':['us','cn','cn','nz','nz','us','cn'],
'num':[5,5,9,3,2,10,8],
'price':[5,5,10,3,3,13,20]})
df1 = df.iloc[0:2]
df2 = df.iloc[2:4]
df3 = df.iloc[4:7] # chunk
a = pd.crosstab(df1['type'], df1['ori'])
b = pd.crosstab(df2['type'], df2['ori'])
c = pd.crosstab(df3['type'], df3['ori']) # crosstab of chunks
使用减少使生活更轻松:
from functools import reduce
reduce(lambda df1, df2:df1.add(df2, fill_value=0) ,[a, b, c])
结果:
ori cn nz us
type
fruit 2.0 0.0 1.0
meat 0.0 1.0 1.0
vegi 1.0 1.0 0.0
这应该与以下内容几乎相同:
pd.crosstab(df['type'], df['ori'])