如何按每个数据帧的长度拆分/分组数据帧列表

时间:2020-10-22 15:11:48

标签: python pandas dataframe

例如,我有一个包含100个数据帧的列表,其中一些的列长为8,其他的列长为10,其他的12。我希望能够根据它们的列长将它们分为几组。我已经尝试过字典,但是无法使其正确地循环添加。

以前尝试过的代码:

col_count = [8, 10, 12]

d = dict.fromkeys(col_count, [])

for df in df_lst:
    for i in col_count:
        if i == len(df.columns):
            d[i] = df

但这似乎只是每次替换dict中的值。我也尝试过.append,但这似乎会附加到所有键上。

2 个答案:

答案 0 :(得分:0)

不是将df分配给d[column_count]。您应该附加它。

您使用d = dict.fromkeys(col_count, [])初始化了d,因此d是一个空列表的字典。

当您执行d[i] = df时,将空列表替换为DataFrame,因此d将是DataFrame的字典。如果您执行d[i].append(df),则将有一个DataFrame列表字典。 (这就是您想要的AFAIU)

我也不确定您是否需要col_count变量。您可以只做d[len(df.columns)].append(df)

答案 1 :(得分:0)

我认为这足以满足您的要求。考虑如何动态解决问题,以更好地利用Python。

In [2]: import pandas as pd

In [3]: for i in range(1, 5):
   ...:     exec(f"df{i} = pd.DataFrame(0, index=range({i}), columns=list('ABCD'))") #making my own testing list of dataframes with variable length
   ...:

In [4]: df1 #one row df
Out[4]:
   A  B  C  D
0  0  0  0  0

In [5]: df2 #two row df
Out[5]:
   A  B  C  D
0  0  0  0  0
1  0  0  0  0

In [6]: df3 #three row df
Out[6]:
   A  B  C  D
0  0  0  0  0
1  0  0  0  0
2  0  0  0  0

In [7]: L = [df1, df2, df3, df4, df5] #i assume all your dataframes are put into something like a container, which is the problem

In [13]: my_3_length_shape_dfs = [] #you need to create some sort of containers for your lengths (you can do an additional exec in the following In

In [14]: for i in L:
    ...:     if i.shape[0] == 3: #add more of these if needed, you mentioned your lengths are known [8, 10, 12]
    ...:         my_3_length_shape_dfs.append(i) #adding the df to a specified container, thus grouping any dfs that are of row length/shape equal to 3
    ...:         print(i)
    ...:
   A  B  C  D
0  0  0  0  0
1  0  0  0  0
2  0  0  0  0