我现在正在使用一个巨大的数据库(超过7亿行),并且为了正确分析它,编写了一些代码:
SQL_Query = pd.read_sql_query(query,connection)
current_frame = pd.DataFrame(SQL_Query,columns = ['foo','bar','zoo',...])
unique_slice = pd.unique(current_frame['zoo'])
for current_slice in unique_slice:
dataframe_slice = current_frame.loc[current_frame['zoo'] == current_slice]
unique_bar_slice = pd.unique(dataframe_slice['bar'])
for current_bar_slice in unique_bar_slice:
current_bar_dataframe = dataframe_slice.loc[dataframe_slice['bar'] == current_bar_slice]
unique_foo_slice = pd.unique(current_bar_dataframe['foo'])
for foo_slice in unique_foo_slice:
#Do stuff
问题: 它类似于:
SQL_Query = pd.read_sql_query(query,connection)
current_frame = pd.DataFrame(SQL_Query,columns = ['foo','bar','zoo',...])
unique_slice = pd.unique(current_frame['zoo'])
for current_slice in unique_slice:
dataframe_slice = current_frame.loc[current_frame['zoo'] == current_slice]
unique_bar_slice = pd.unique(dataframe_slice['bar'])
for current_bar_slice in unique_bar_slice:
current_bar_dataframe = current_frame.loc[(current_frame['zoo']==current_slice)&(current_frame['bar'] == current_bar_slice)]
unique_foo_slice = pd.unique(current_bar_dataframe['foo'])
for foo_slice in unique_foo_slice:
#Do stuff
当我需要基于列中值的主框架子帧lvl3时,使用熊猫的最有效方法是什么? 谢谢!