Question

我在pandas数据框中使用min（）函数，意图获得最小值。

然而，在DataFrame中，所有＆＃34;坏数据＆＃34;值已替换为-9999999。

如何在min（）函数中忽略该值？该值不包含数据值。

这里有一些代码：

# the for I, row loop is designed to identify which rows are data rows and which rows are not.  the bottom portion filters out non-data rows.  
xl = pd.read_excel(location, header=None, sheet_name=0)
keep = []
for i, row in xl.iterrows():
    cells = 0
    numbers = 0
    for j, column in row.iteritems():
        cells += 1
        if type(column).__name__ in ('float', 'int') and not pd.isnull(column):
            numbers += 1
        #print(i,column)
    #print(i, cells, numbers, numbers/cells*100)
    if numbers/cells*100 > 50:
        keep.append(i)


#filters out those records that are most likely NOT data rows
df = xl.iloc[keep]
#apply's -9999999 default value to conform to data type standards
df = df.apply(lambda x: pd.to_numeric(x, errors='coerce')).fillna(-9999999)

# ToDo: Ignore -9999999 when performing the below functions
dfmax = df.max()
dfmin = df.min()

谢谢！

警告：如果我不符合默认值，则min（）和max（）函数不会报告所有记录的值，因为该列将是混合数据类型。

Answer 1

解决方案是获取超过该数字的值：

df.values[df.values > -9999999].min()

通常， Numpy不是数字 np.nan是坏数据的最佳表示，而不是实际的数值，而在Pandas v> 0.15中，它将NULL写入SQL。

在带有异常值的pandas数据报中使用min（）函数

1 个答案: