如果我有一些像这样的列表中的DataFrame:
X = pd.DataFrame({"t":[1,2,3,4,5,6,7,8],"A":[34,12,78,84,26,84,26,34], "B":[54,87,35,25,82,35,25,82], "C":[56,78,0,14,13,0,14,13], "D":[0,23,72,56,14,72,56,14], "E":[78,12,31,0,34,31,0,34]})
Y = pd.DataFrame({"t":[1,2,3],"A":[45,24,65], "B":[45,87,65], "C":[98,52,32], "D":[0,23,1], "E":[24,12, 65]})
Z = pd.DataFrame({"t":[1,2,3,4,5],"A":[14,96,25,2,25], "B":[47,7,5,58,34], "C":[85,45,65,53,53], "D":[3,35,12,56,236], "E":[68,10,45,46,85]})
allFiles = [X, Y, Z]
list_ = []
for file_ in allFiles:
df = file_
df = df.sort('t')
list_.append(df)
然后列表如下:
如何缩短每个数据帧的长度,缩短到最短的长度?
EDIT。请记住,我希望将列表与df的
保持一致答案 0 :(得分:3)
如果DataFrames
中没有NaN
值,则可以concat
与dropna
一起使用:
df = pd.concat(allFiles, keys=list('ABC'), axis=1).dropna()
print (df)
A B C \
A B C D E t A B C D E t A B C
0 34 54 56 0 78 1 45.0 45.0 98.0 0.0 24.0 1.0 14.0 47.0 85.0
1 12 87 78 23 12 2 24.0 87.0 52.0 23.0 12.0 2.0 96.0 7.0 45.0
2 78 35 0 72 31 3 65.0 65.0 32.0 1.0 65.0 3.0 25.0 5.0 65.0
D E t
0 3.0 68.0 1.0
1 35.0 10.0 2.0
2 12.0 45.0 3.0
然后使用list comprehension
{/ 3> groupby
创建新列表
list_ = [g for i, g in df.groupby(level=0, axis=1, group_keys=False)]
print (list_)
[ A
A B C D E t
0 34 54 56 0 78 1
1 12 87 78 23 12 2
2 78 35 0 72 31 3, B
A B C D E t
0 45.0 45.0 98.0 0.0 24.0 1.0
1 24.0 87.0 52.0 23.0 12.0 2.0
2 65.0 65.0 32.0 1.0 65.0 3.0, C
A B C D E t
0 14.0 47.0 85.0 3.0 68.0 1.0
1 96.0 7.0 45.0 35.0 10.0 2.0
2 25.0 5.0 65.0 12.0 45.0 3.0]
但是输出结果为Multiindex
,因此您需要groupby
创建第一级get_value
,然后droplevel
删除:
df = pd.concat(allFiles, keys=list('ABC'), axis=1).dropna()
lvl = df.columns.get_level_values(0)
df.columns = df.columns.droplevel(0)
print (df)
A B C D E t A B C D E t A B C \
0 34 54 56 0 78 1 45.0 45.0 98.0 0.0 24.0 1.0 14.0 47.0 85.0
1 12 87 78 23 12 2 24.0 87.0 52.0 23.0 12.0 2.0 96.0 7.0 45.0
2 78 35 0 72 31 3 65.0 65.0 32.0 1.0 65.0 3.0 25.0 5.0 65.0
D E t
0 3.0 68.0 1.0
1 35.0 10.0 2.0
2 12.0 45.0 3.0
list_ = [g for i, g in df.groupby(lvl, axis=1)]
print (list_)
[ A B C D E t
0 34 54 56 0 78 1
1 12 87 78 23 12 2
2 78 35 0 72 31 3, A B C D E t
0 45.0 45.0 98.0 0.0 24.0 1.0
1 24.0 87.0 52.0 23.0 12.0 2.0
2 65.0 65.0 32.0 1.0 65.0 3.0, A B C D E t
0 14.0 47.0 85.0 3.0 68.0 1.0
1 96.0 7.0 45.0 35.0 10.0 2.0
2 25.0 5.0 65.0 12.0 45.0 3.0]
print (list_[0])
A B C D E t
0 34 54 56 0 78 1
1 12 87 78 23 12 2
2 78 35 0 72 31 3
另一个更简单的解决方案:
allFiles = [X, Y, Z]
min_len = np.min([len(df.index) for df in allFiles])
print (min_len)
3
print ([df.reindex(np.arange(min_len)) for df in allFiles])
[ A B C D E t
0 34 54 56 0 78 1
1 12 87 78 23 12 2
2 78 35 0 72 31 3, A B C D E t
0 45 45 98 0 24 1
1 24 87 52 23 12 2
2 65 65 32 1 65 3, A B C D E t
0 14 47 85 3 68 1
1 96 7 45 35 10 2
2 25 5 65 12 45 3]
EDIT1:解决方案,t
为index
且值为unique
。
获取最短index
,然后在list comprehension
中使用reindex
:
X = X.set_index('t')
Y = Y.set_index('t')
Z = Z.set_index('t')
allFiles = [X, Y, Z]
min_idx = min([df.index for df in allFiles], key=len)
print (min_idx)
Int64Index([1, 2, 3], dtype='int64', name='t')
print ([df.reindex(min_idx) for df in allFiles])
[ A B C D E
t
1 34 54 56 0 78
2 12 87 78 23 12
3 78 35 0 72 31, A B C D E
t
1 45 45 98 0 24
2 24 87 52 23 12
3 65 65 32 1 65, A B C D E
t
1 14 47 85 3 68
2 96 7 45 35 10
3 25 5 65 12 45]