我在pandas数据框中使用min()函数,意图获得最小值。
然而,在DataFrame中,所有"坏数据"值已替换为-9999999。
如何在min()函数中忽略该值?该值不包含数据值。
这里有一些代码:
# the for I, row loop is designed to identify which rows are data rows and which rows are not. the bottom portion filters out non-data rows.
xl = pd.read_excel(location, header=None, sheet_name=0)
keep = []
for i, row in xl.iterrows():
cells = 0
numbers = 0
for j, column in row.iteritems():
cells += 1
if type(column).__name__ in ('float', 'int') and not pd.isnull(column):
numbers += 1
#print(i,column)
#print(i, cells, numbers, numbers/cells*100)
if numbers/cells*100 > 50:
keep.append(i)
#filters out those records that are most likely NOT data rows
df = xl.iloc[keep]
#apply's -9999999 default value to conform to data type standards
df = df.apply(lambda x: pd.to_numeric(x, errors='coerce')).fillna(-9999999)
# ToDo: Ignore -9999999 when performing the below functions
dfmax = df.max()
dfmin = df.min()
谢谢!
警告:如果我不符合默认值,则min()和max()函数不会报告所有记录的值,因为该列将是混合数据类型。
答案 0 :(得分:2)
解决方案是获取超过该数字的值:
df.values[df.values > -9999999].min()
通常, Numpy不是数字 np.nan
是坏数据的最佳表示,而不是实际的数值,而在Pandas v> 0.15中,它将NULL写入SQL。