我需要更改我的代码才能使用NumPy
2D数组而不是pandas
数据帧:
df = pd.DataFrame(data=np.array([[nan, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=["col1", "col2", "col3"])
list_of_NA_features = ["col1"]
for feature in list_of_NA_features:
for index,row in df.iterrows():
if (pd.isnull(row[feature]) == True):
missing_value = 5 # for simplicity, let's put 5 instead of a function
df.ix[index,feature] = missing_val
对for index,row in df.iterrows():
数组执行pd.isnull(row[feature]) == True
,df.ix[index,feature] = missing_val
和NumPy
的正确方法是什么?
这是我到目前为止所做的:
np_arr = df.as_matrix
for feature in list_of_NA_features:
for feature in xrange(np_arr.shape[1]):
# ???
如何让行索引能够执行np_arr[irow,feature]
?另外,为NumPy
数组中的特定行和列分配值的正确方法是什么:df.ix[index,feature] = missing_val
?
更新
我通过删除函数fill_missing_values
并用值5
替换它来简化代码。但是,在我的实际情况中,我需要估计缺失值。
答案 0 :(得分:-1)
<强>设置强>
#setup a numpy array the same as your Dataframe
a = np.array([[np.nan, 2., 3.],
[ 4., 5., 6.],
[ 7., 8., 9.]])
#list_of_NA_features now contains the column index in the numpy array
list_of_NA_features = [0]
<强>解决方案:强>
#Now you can see how those operations can be carried out on a numpy array. I'm just saying you can do this on a numpy array in the way you did it on a Dataframe. I'm not saying this is the best way of doing what you are trying to do.
for feature in list_of_NA_features:
for index, row in enumerate(a):
if np.isnan(row[feature]):
missing_value = 5
a[index,feature] = missing_value
Out[167]:
array([[ 5., 2., 3.],
[ 4., 5., 6.],
[ 7., 8., 9.]])