如何使用其他列中的值填充熊猫中一列的nan

时间:2020-02-21 20:00:58

标签: python pandas dataframe

例如,我需要用其他列中的值填充nan值 我有这样的df

col1, col2, col3, col4
1     nan    nan   nan
2     3      nan   nan
4     nan    5     nan
6     8      nan   9 

我需要将上面的df变成

col1, col2, col3, col4
1     nan    nan     1
2       3    nan     2
4     nan      5     4
6     8      nan     9 

我要遍历col1col3,并获取该行的第一个可用值,并使用该值替换nan中的col4,但是,如果col4中的行已具有值,则忽略该行。

我被告知,遍历数据帧不是理想的选择,我还有什么其他选择?

3 个答案:

答案 0 :(得分:1)

尝试:

df.assign(col4 = df.apply(lambda row: row[row.first_valid_index()], axis=1))

输出:

   col1,col2,col3,col4
0   1.0 NaN NaN 1.0
1   NaN 3.0 NaN 3.0
2   4.0 NaN 5.0 4.0
3   6.0 8.0 NaN 6.0

df.assign(col4 = df.apply(lambda row: row.first_valid_index(), axis=1))

这将为您提供:

   col1,col2,col3,col4
0   1.0 NaN NaN col1,
1   NaN 3.0 NaN col2,
2   4.0 NaN 5.0 col1,
3   6.0 8.0 NaN col1,

使用这些信息,您可以分配值。

更好地使用:

df['col4'] = df.apply(
    lambda row: row[row.first_valid_index()] if np.isnan(row['col4']) else row['col4'],
    axis=1
)

这将为您提供所需的结果(因为我们必须填写col4的NaN)

   col1,col2,col3,col4
0   1.0 NaN NaN 1.0
1   NaN 3.0 NaN 3.0
2   4.0 NaN 5.0 4.0
3   6.0 8.0 NaN 9.0

答案 1 :(得分:1)

使用bfillfillna

df['col4'] = df['col4'].fillna(df.bfill(1)['col1'])

Out[833]:
   col1  col2  col3  col4
0     1   NaN   NaN   1.0
1     2   3.0   NaN   2.0
2     4   NaN   5.0   4.0
3     6   8.0   NaN   9.0

答案 2 :(得分:0)

您可以只使用mlp = keras.models.Sequential() # add input layer mlp.add( keras.layers.Input( shape = (training_dataset.shape[1], ) ) ) # add hidden layer mlp.add( keras.layers.Dense( units=training_dataset.shape[1] - 500, input_shape = (training_dataset.shape[1] - 500,), kernel_initializer=keras.initializers.RandomUniform(minval=-0.05, maxval=0.05, seed=None), bias_initializer='zeros', activation='relu') ) # add output layer mlp.add( keras.layers.Dense( units=1, input_shape = (1, ), kernel_initializer=keras.initializers.RandomUniform(minval=-0.05, maxval=0.05, seed=None), bias_initializer='zeros', activation='sigmoid') ) # define SGD optimizer sgd_optimizer = keras.optimizers.SGD(lr=0.00001, decay=1e-2) print('Compiling model...\n') mlp.compile( optimizer=sgd_optimizer, loss=listnet_loss ) mlp.summary() # print model settings generator = DataGenerator(training_dataset, training_dataset_labels[0:5000], groups_id_count, []) # Training with tf.device('/GPU:0'): print('Start training') mlp.fit(generator, steps_per_epoch=len(training_dataset), epochs=50, verbose=1, workers=10, use_multiprocessing=True, callbacks=[KendallTauHistory(generator)]) 并遍历列名:

def listnet_loss(real_labels, predicted_labels):
  return -K.sum(get_top_one_probability(real_labels) * tf.math.log(get_top_one_probability(predicted_labels)))

这将为您提供:

fillna