Question

我需要更改我的代码才能使用NumPy 2D数组而不是pandas数据帧：

df = pd.DataFrame(data=np.array([[nan, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=["col1", "col2", "col3"])

list_of_NA_features = ["col1"]

for feature in list_of_NA_features:
    for index,row in df.iterrows():
        if (pd.isnull(row[feature]) == True):
            missing_value = 5 # for simplicity, let's put 5 instead of a function
            df.ix[index,feature] = missing_val

对for index,row in df.iterrows():数组执行pd.isnull(row[feature]) == True，df.ix[index,feature] = missing_val和NumPy的正确方法是什么？

这是我到目前为止所做的：

np_arr = df.as_matrix

for feature in list_of_NA_features:
    for feature in xrange(np_arr.shape[1]):
        # ???

如何让行索引能够执行np_arr[irow,feature]？另外，为NumPy数组中的特定行和列分配值的正确方法是什么：df.ix[index,feature] = missing_val？

更新

我通过删除函数fill_missing_values并用值5替换它来简化代码。但是，在我的实际情况中，我需要估计缺失值。

Answer 1

<强>设置

#setup a numpy array the same as your Dataframe
a = np.array([[np.nan,   2.,   3.],
       [  4.,   5.,   6.],
       [  7.,   8.,   9.]])

#list_of_NA_features now contains the column index in the numpy array
list_of_NA_features = [0]

<强>解决方案：

#Now you can see how those operations can be carried out on a numpy array. I'm just saying you can do this on a numpy array in the way you did it on a Dataframe. I'm not saying this is the best way of doing what you are trying to do.
for feature in list_of_NA_features:
    for index, row in enumerate(a):
        if np.isnan(row[feature]):
            missing_value = 5
            a[index,feature] = missing_value 

Out[167]: 
array([[ 5.,  2.,  3.],
       [ 4.,  5.,  6.],
       [ 7.,  8.,  9.]])

更改代码以处理NumPy数组而不是Pandas数据帧

1 个答案: