I am trying to create a pandas dataframe that describes the NULL value percentage for each feature in my training dataset and also gives a correlation coefficient for each numeric feature with respect to the dependent variable. Here is my code:
#Count nulls and compute share
null_cols = pd.DataFrame(train.isnull().sum().sort_values(ascending = False))
null_cols.columns = ['NullCount']
null_cols.index.name = 'Features'
null_cols['Share'] = np.round(100 * null_cols['NullCount'] / len(train), decimals=2)
#Compute correlation of each numeric feature with respect to the dependent variable
for row in null_cols.index:
print(row, np.where(is_numeric_dtype(train[row]), str(train['Dependent Var'].corr(train[row])), ''))
#print(row, np.where(is_numeric_dtype(train[row]), str(train[row].isnull().sum()), ''))
On running this, I get TypeError: unsupported operand type(s) for /: 'str' and 'int'. I think this error is coming from the corr function but why is it running that in the 'where' function for a non-numeric data type. Shouldn't it fall into the else?
The commented line of code i.e.
print(row, np.where(is_numeric_dtype(train[row]),str(train[row].isnull().sum()),''))
runs fine without an error and the 'where' function works as expected.
答案 0 :(得分:0)
让我们回顾一下Python如何运行此代码:
np.where(is_numeric_dtype(train[row]), str(train['Dependent Var'].corr(train[row])), '')
where
是一个函数。 Python将所有参数传递给函数之前先对其求值。因此它评估:
is_numeric_dtype(train[row])
str(train['Dependent Var'].corr(train[row]))
''
在致电where
之前。
如果只能在某些类型的值上运行corr
,则np.where
不是使用的工具。我认为您需要:
for row in null_cols.index:
if is_numeric_dtype(train[row]):
print(row, str(train['Dependent Var'].corr(train[row])))
else:
print('')