Question

我想使用ref pd.DataFrame填充矩阵xxx，但请跳过NaN。

print xxx
OUT >> 
   intensity name  rowtype1  rowtype2
0        100    A         1       4.0
1        200    A         2       NaN
2        300    B         3       5.0

然后我按ref[rowtype,col] = intensity填充矩阵，其中我有2 rowtype。

ref = np.zeros(shape=(7,4))
for idx, inte, name, r1, r2 in xxx.itertuples():
    ref[r1,idx] = inte
    ref[r2,idx] = inte # error because of NaN in rowtype2

print ref

如何在此处跳过NaN？我知道使用drop.na()的一种方法，但必须创建具有rowtype2和intensity的新数据框。我希望有一个简单快捷的方法，例如只需NaN intensity = 200跳转到下一个rowtype2 = 5 intensity = 300 xxx。

其他信息：

1）以下是创建prot = ['A','A','B'] calc_m = [1,2,3] calc_m2 = [4, np.nan,5] inte = [100,200,300] xxx = pd.DataFrame({'name' : pd.Series(prot), 'rowtype1': pd.Series(calc_m), 'rowtype2': pd.Series(calc_m2), 'intensity': pd.Series(inte) })

的方法

const express = require("express"),
      path = require("path"),
      app = express()

const DIST_DIR = path.normalize(__dirname + "/../../VueOutputDir")

app.use(express.static(DIST_DIR))

Answer 1

您可以使用melt使用此选项，然后使用numpy的索引与使用for循环设置ref的索引

set = xxx.reset_index().melt(['intensity','index'],['rowtype1','rowtype2']).dropna()

ref[set.value.astype(int).values,set['index'].values] = set.intensity.values

给你

array([[   0.,    0.,    0.,    0.],
       [ 100.,    0.,    0.,    0.],
       [   0.,  200.,    0.,    0.],
       [   0.,    0.,  300.,    0.],
       [ 100.,    0.,    0.,    0.],
       [   0.,    0.,  300.,    0.],
       [   0.,    0.,    0.,    0.]])

Answer 2

I'm not sure I fully understand what behavior you are looking for, but the pandas dropna() command has the "subset" argument... for example, dropping all rows with NaN in the rowtype2 column could be done with

xxx.dropna(subset=['rowtype2'],inplace=True)

That way, you would drop only rows with NaN in the rowtype2 column.

通过pandas.DataFrame中的数据填充矩阵，跳过NaN

2 个答案: