将较大的数据帧拆分为多个较小的数据帧

时间:2019-05-22 18:39:26

标签: python python-3.x pandas python-2.7 numpy

我有一个尺寸为(28260,25)的数据框df 现在,我想将此数据帧分配到20个小数据帧中,每个小数据帧的尺寸为(1413,25),名称分别为df_1,df_2 .... df_20

例如: 输入数据框

frames={}
for e,i in enumerate(np.split(df,20)):
    frames.update([('df'+str(e+1),pd.DataFrame(np.random.permutation(i),columns=df.columns))])


1 个答案:

答案 0 :(得分:0)

如果要将所有数据框保留在 dict 中,这是一种方法:

# import modules
import pandas as pd
import numpy as np


# Create dataframe of 25 columns and 28260 rows
df = pd.DataFrame({"col_"+str(i): np.random.randint(0, 10, 28260)
                   for i in range(25)})
print(df.head(5))
#    col_0  col_1  col_2  col_3  col_4  col_5  col_6  col_7  col_8  ...  col_16  col_17  col_18  col_19  col_20  col_21  col_22  col_23  col_24
# 0      5      0      1      5      9      7      2      9      5  ...       5       1       3       8       2       3       9       7       4
# 1      7      1      5      0      2      1      5      9      6  ...       6       1       1       7       8       7       0       2       1
# 2      0      3      6      1      3      8      7      4      7  ...       9       9       7       7       8       9       1       6       9
# 3      7      7      3      3      3      1      3      4      9  ...       2       2       7       9       8       0       2       0       8
# 4      0      1      3      9      7      4      4      3      8  ...       9       5       8       4       5       4       3       9       6


print("Dimension df: ", df.shape)
# Dimension:  (28260, 25)

# Create dict of sub dataframe
dict_df = {"df_"+str(i): df.iloc[i*28260//20:(i+1)*28260//20] for i in range(20)}
print("Keys: ", dict_df.keys())
# Keys:  dict_keys(['df_0', 'df_1', 'df_2', 'df_3', 'df_4', 'df_5', 'df_6', 'df_7', 'df_8',
#                   'df_9', 'df_10', 'df_11', 'df_12', 'df_13', 'df_14', 'df_15', 'df_16',
#                   'df_17', 'df_18', 'df_19'])

print("Size of each sub_dataframe: ", dict_df["df_1"].shape)
# Size of each sub_dataframe:  (1413, 25)

列表中:

# List of sub dataframes
list_df = []
for i in range(20):
    list_df.append(df.iloc[i*28260//20:(i+1)*28260//20])

print("Number of sub_dataframes: ", len(list_df))
# Number of sub_dataframes: 20
print("Size of each sub_dataframe: ", list_df[0].shape)
# Size of each sub_dataframe: (1413, 25)