Numpy's 'where' function behaving ambiguously

时间:2019-02-18 00:19:33

标签: python pandas numpy correlation

I am trying to create a pandas dataframe that describes the NULL value percentage for each feature in my training dataset and also gives a correlation coefficient for each numeric feature with respect to the dependent variable. Here is my code:

#Count nulls and compute share
null_cols = pd.DataFrame(train.isnull().sum().sort_values(ascending = False))
null_cols.columns = ['NullCount']
null_cols.index.name = 'Features'
null_cols['Share'] = np.round(100 * null_cols['NullCount'] / len(train), decimals=2)

#Compute correlation of each numeric feature with respect to the dependent variable
for row in null_cols.index:
    print(row, np.where(is_numeric_dtype(train[row]), str(train['Dependent Var'].corr(train[row])), ''))
    #print(row, np.where(is_numeric_dtype(train[row]), str(train[row].isnull().sum()), ''))

On running this, I get TypeError: unsupported operand type(s) for /: 'str' and 'int'. I think this error is coming from the corr function but why is it running that in the 'where' function for a non-numeric data type. Shouldn't it fall into the else?

The commented line of code i.e.

print(row, np.where(is_numeric_dtype(train[row]),str(train[row].isnull().sum()),'')) 

runs fine without an error and the 'where' function works as expected.

1 个答案:

答案 0 :(得分:0)

让我们回顾一下Python如何运行此代码:

np.where(is_numeric_dtype(train[row]), str(train['Dependent Var'].corr(train[row])), '')

where是一个函数。 Python将所有参数传递给函数之前先对其求值。因此它评估:

is_numeric_dtype(train[row])
str(train['Dependent Var'].corr(train[row]))
''

在致电where之前。

如果只能在某些类型的值上运行corr,则np.where不是使用的工具。我认为您需要:

for row in null_cols.index:
    if is_numeric_dtype(train[row]):
        print(row, str(train['Dependent Var'].corr(train[row])))
    else:
        print('')