Question

我试图用最后一次良好的观察来填补NaN值。

我没有使用DataFrame，只是用sparkcontext读取一个dat文件。

在我的示例中，所有NaN值的值都应为104。

Answer 1

您可以使用DataFrameNaFunctions对象从数据集中过滤掉（或替换）NaN值：

示例：

Dataset<Row> yourDataSet = sparkSession.createDataFrame(yourJavaRDDCollection, yourSchema);
Dataset<Row> dfNaNFilter = new DataFrameNaFunctions(yourDataSet);

// If you want to remove all of them:
Dataset<Row> nonNaNValues = dfNaNFilter.drop();

// If you want to replace them with a numeric value (e.g. 104):
Dataset<Row> replacedNaNValues = dfNaNFilter.fill(104);

如何使用java spark用最后一次良好的观察来填充NaN值？

1 个答案: