例如,我需要用其他列中的值填充nan
值
我有这样的df
:
col1, col2, col3, col4
1 nan nan nan
2 3 nan nan
4 nan 5 nan
6 8 nan 9
我需要将上面的df
变成
col1, col2, col3, col4
1 nan nan 1
2 3 nan 2
4 nan 5 4
6 8 nan 9
我要遍历col1
〜col3
,并获取该行的第一个可用值,并使用该值替换nan
中的col4
,但是,如果col4
中的行已具有值,则忽略该行。
我被告知,遍历数据帧不是理想的选择,我还有什么其他选择?
答案 0 :(得分:1)
尝试:
df.assign(col4 = df.apply(lambda row: row[row.first_valid_index()], axis=1))
输出:
col1,col2,col3,col4
0 1.0 NaN NaN 1.0
1 NaN 3.0 NaN 3.0
2 4.0 NaN 5.0 4.0
3 6.0 8.0 NaN 6.0
df.assign(col4 = df.apply(lambda row: row.first_valid_index(), axis=1))
这将为您提供:
col1,col2,col3,col4
0 1.0 NaN NaN col1,
1 NaN 3.0 NaN col2,
2 4.0 NaN 5.0 col1,
3 6.0 8.0 NaN col1,
使用这些信息,您可以分配值。
更好地使用:
df['col4'] = df.apply(
lambda row: row[row.first_valid_index()] if np.isnan(row['col4']) else row['col4'],
axis=1
)
这将为您提供所需的结果(因为我们必须填写col4的NaN)
col1,col2,col3,col4
0 1.0 NaN NaN 1.0
1 NaN 3.0 NaN 3.0
2 4.0 NaN 5.0 4.0
3 6.0 8.0 NaN 9.0
答案 1 :(得分:1)
使用bfill
和fillna
df['col4'] = df['col4'].fillna(df.bfill(1)['col1'])
Out[833]:
col1 col2 col3 col4
0 1 NaN NaN 1.0
1 2 3.0 NaN 2.0
2 4 NaN 5.0 4.0
3 6 8.0 NaN 9.0
答案 2 :(得分:0)
您可以只使用mlp = keras.models.Sequential()
# add input layer
mlp.add(
keras.layers.Input(
shape = (training_dataset.shape[1], )
)
)
# add hidden layer
mlp.add(
keras.layers.Dense(
units=training_dataset.shape[1] - 500,
input_shape = (training_dataset.shape[1] - 500,),
kernel_initializer=keras.initializers.RandomUniform(minval=-0.05, maxval=0.05, seed=None),
bias_initializer='zeros',
activation='relu')
)
# add output layer
mlp.add(
keras.layers.Dense(
units=1,
input_shape = (1, ),
kernel_initializer=keras.initializers.RandomUniform(minval=-0.05, maxval=0.05, seed=None),
bias_initializer='zeros',
activation='sigmoid')
)
# define SGD optimizer
sgd_optimizer = keras.optimizers.SGD(lr=0.00001, decay=1e-2)
print('Compiling model...\n')
mlp.compile(
optimizer=sgd_optimizer,
loss=listnet_loss
)
mlp.summary() # print model settings
generator = DataGenerator(training_dataset, training_dataset_labels[0:5000], groups_id_count, [])
# Training
with tf.device('/GPU:0'):
print('Start training')
mlp.fit(generator, steps_per_epoch=len(training_dataset),
epochs=50, verbose=1, workers=10,
use_multiprocessing=True,
callbacks=[KendallTauHistory(generator)])
并遍历列名:
def listnet_loss(real_labels, predicted_labels):
return -K.sum(get_top_one_probability(real_labels) * tf.math.log(get_top_one_probability(predicted_labels)))
这将为您提供:
fillna