使用pandas在多个工作表中查找最小值

时间:2017-06-27 05:23:14

标签: python excel pandas min worksheet

如何在整个工作表中找到每个索引的多个工作表中的最小值

假设,

  worksheet 1

    index    A   B   C
       0     2   3   4.28
       1     3   4   5.23
    worksheet 2

    index    A   B   C
        0    9   6   5.9
        1    1   3   4.1

    worksheet 3

    index    A   B   C
        0    9   6   6.0
        1    1   3   4.3
 ...................(Worksheet 4,Worksheet 5)...........
by comparing C column, I want an answer, where dataframe looks like

index      min(c)
    0       4.28
    1       4.1

2 个答案:

答案 0 :(得分:3)

from functools import reduce

reduce(np.fmin, [ws1.C, ws2.C, ws3.C])

index
0    4.28
1    4.10
Name: C, dtype: float64

这很好地概括了理解

reduce(np.fmin, [w.C for w in [ws1, ws2, ws3, ws4, ws5]])

如果您必须坚持您的专栏名称

from functools import reduce

reduce(np.fmin, [ws1.C, ws2.C, ws3.C]).to_frame('min(C)')

       min(C)
index        
0        4.28
1        4.10

您还可以在字典上使用pd.concat并将pd.Series.minlevel=1参数一起使用

pd.concat(dict(enumerate([w.C for w in [ws1, ws2, ws3]]))).min(level=1)
# equivalently
# pd.concat(dict(enumerate([w.C for w in [ws1, ws2, ws3]])), axis=1).min(1)

index
0    4.28
1    4.10
Name: C, dtype: float64

注意:

dict(enumerate([w.C for w in [ws1, ws2, ws3]]))

是另一种说法

{0: ws1.C, 1: ws2.C, 2: ws3.C}

答案 1 :(得分:3)

您需要read_excel sheetname=None OrderedDict reduce来自all sheetnames,然后将dfs = pd.read_excel('file.xlsx', sheetname=None) print (dfs) OrderedDict([('Sheet1', A B C 0 2 3 4.28 1 3 4 5.23), ('Sheet2', A B C 0 9 6 5.9 1 1 3 4.1), ('Sheet3', A B C 0 9 6 6.0 1 1 3 4.3)]) from functools import reduce df = reduce(np.fmin, [v['C'] for k,v in dfs.items()]) print (df) 0 4.28 1 4.10 Name: C, dtype: float64 列为numpy.fmin列表:

df = pd.concat([v['C'] for k,v in dfs.items()],axis=1).min(axis=1)
print (df)
0    4.28
1    4.10
dtype: float64

concat的解决方案:

read_excel

如果需要在dfs = pd.read_excel('file.xlsx', sheetname=None, index_col='index') print (dfs) OrderedDict([('Sheet1', A B C index 0 2 3 4.28 1 3 4 5.23), ('Sheet2', A B C index 0 9 6 5.9 1 1 3 4.1), ('Sheet3', A B C index 0 9 6 6.0 1 1 3 4.3)]) df = pd.concat([v['C'] for k,v in dfs.items()], axis=1).min(axis=1) print (df) index 0 4.28 1 4.10 dtype: float64 中定义索引:

 coords(coords(:,2)<1,2)=1;