如何使数据帧列表的长度相等

时间:2016-12-08 11:43:08

标签: python pandas dataframe

如果我有一些像这样的列表中的DataFrame:

X = pd.DataFrame({"t":[1,2,3,4,5,6,7,8],"A":[34,12,78,84,26,84,26,34], "B":[54,87,35,25,82,35,25,82], "C":[56,78,0,14,13,0,14,13], "D":[0,23,72,56,14,72,56,14], "E":[78,12,31,0,34,31,0,34]})
Y = pd.DataFrame({"t":[1,2,3],"A":[45,24,65], "B":[45,87,65], "C":[98,52,32], "D":[0,23,1], "E":[24,12, 65]})
Z = pd.DataFrame({"t":[1,2,3,4,5],"A":[14,96,25,2,25], "B":[47,7,5,58,34], "C":[85,45,65,53,53], "D":[3,35,12,56,236], "E":[68,10,45,46,85]})

allFiles = [X, Y, Z]
list_ = []
for file_ in allFiles:
    df = file_
    df = df.sort('t')
    list_.append(df) 

然后列表如下:

enter image description here

如何缩短每个数据帧的长度,缩短到最短的长度?

EDIT。请记住,我希望将列表与df的

保持一致

1 个答案:

答案 0 :(得分:3)

如果DataFrames中没有NaN值,则可以concatdropna一起使用:

df = pd.concat(allFiles, keys=list('ABC'), axis=1).dropna()
print (df)
    A                        B                                  C              \
    A   B   C   D   E  t     A     B     C     D     E    t     A     B     C   
0  34  54  56   0  78  1  45.0  45.0  98.0   0.0  24.0  1.0  14.0  47.0  85.0   
1  12  87  78  23  12  2  24.0  87.0  52.0  23.0  12.0  2.0  96.0   7.0  45.0   
2  78  35   0  72  31  3  65.0  65.0  32.0   1.0  65.0  3.0  25.0   5.0  65.0   


      D     E    t  
0   3.0  68.0  1.0  
1  35.0  10.0  2.0  
2  12.0  45.0  3.0  

然后使用list comprehension {/ 3> groupby创建新列表

list_ = [g for i, g in df.groupby(level=0, axis=1, group_keys=False)]
print (list_)
[    A                   
    A   B   C   D   E  t
0  34  54  56   0  78  1
1  12  87  78  23  12  2
2  78  35   0  72  31  3,       B                             
      A     B     C     D     E    t
0  45.0  45.0  98.0   0.0  24.0  1.0
1  24.0  87.0  52.0  23.0  12.0  2.0
2  65.0  65.0  32.0   1.0  65.0  3.0,       C                             
      A     B     C     D     E    t
0  14.0  47.0  85.0   3.0  68.0  1.0
1  96.0   7.0  45.0  35.0  10.0  2.0
2  25.0   5.0  65.0  12.0  45.0  3.0]

但是输出结果为Multiindex,因此您需要groupby创建第一级get_value,然后droplevel删除:

df = pd.concat(allFiles, keys=list('ABC'), axis=1).dropna()
lvl = df.columns.get_level_values(0)
df.columns = df.columns.droplevel(0)
print (df)
    A   B   C   D   E  t     A     B     C     D     E    t     A     B     C  \
0  34  54  56   0  78  1  45.0  45.0  98.0   0.0  24.0  1.0  14.0  47.0  85.0   
1  12  87  78  23  12  2  24.0  87.0  52.0  23.0  12.0  2.0  96.0   7.0  45.0   
2  78  35   0  72  31  3  65.0  65.0  32.0   1.0  65.0  3.0  25.0   5.0  65.0   

      D     E    t  
0   3.0  68.0  1.0  
1  35.0  10.0  2.0  
2  12.0  45.0  3.0  
list_ = [g for i, g in df.groupby(lvl, axis=1)]

print (list_)

[    A   B   C   D   E  t
0  34  54  56   0  78  1
1  12  87  78  23  12  2
2  78  35   0  72  31  3,       A     B     C     D     E    t
0  45.0  45.0  98.0   0.0  24.0  1.0
1  24.0  87.0  52.0  23.0  12.0  2.0
2  65.0  65.0  32.0   1.0  65.0  3.0,       A     B     C     D     E    t
0  14.0  47.0  85.0   3.0  68.0  1.0
1  96.0   7.0  45.0  35.0  10.0  2.0
2  25.0   5.0  65.0  12.0  45.0  3.0]

print (list_[0])
    A   B   C   D   E  t
0  34  54  56   0  78  1
1  12  87  78  23  12  2
2  78  35   0  72  31  3

另一个更简单的解决方案:

allFiles = [X, Y, Z]

min_len = np.min([len(df.index) for df in allFiles])
print (min_len)
3

print ([df.reindex(np.arange(min_len)) for df in allFiles])
[    A   B   C   D   E  t
0  34  54  56   0  78  1
1  12  87  78  23  12  2
2  78  35   0  72  31  3,     A   B   C   D   E  t
0  45  45  98   0  24  1
1  24  87  52  23  12  2
2  65  65  32   1  65  3,     A   B   C   D   E  t
0  14  47  85   3  68  1
1  96   7  45  35  10  2
2  25   5  65  12  45  3]

EDIT1:解决方案,tindex且值为unique

获取最短index,然后在list comprehension中使用reindex

X = X.set_index('t')
Y = Y.set_index('t')
Z = Z.set_index('t')
allFiles = [X, Y, Z]

min_idx = min([df.index for df in allFiles], key=len)
print (min_idx)
Int64Index([1, 2, 3], dtype='int64', name='t')

print ([df.reindex(min_idx) for df in allFiles])
[    A   B   C   D   E
t                    
1  34  54  56   0  78
2  12  87  78  23  12
3  78  35   0  72  31,     A   B   C   D   E
t                    
1  45  45  98   0  24
2  24  87  52  23  12
3  65  65  32   1  65,     A   B   C   D   E
t                    
1  14  47  85   3  68
2  96   7  45  35  10
3  25   5  65  12  45]