我正在尝试在用于Spark DataFrame的select语句中合并Try()。getOrElse()语句。我正在研究的项目将应用于多个环境。但是,就单个字段而言,每种环境的原始数据命名都有些不同。我不想编写几个不同的函数来处理每个不同的字段。在DataFrame select语句中是否有一种优雅的方式来处理异常,如下所示?
val dfFilter = dfRaw
.select(
Try($"some.field.nameOption1).getOrElse($"some.field.nameOption2"),
$"some.field.abc",
$"some.field.def"
)
dfFilter.show(33, false)
但是,我不断收到以下错误,这是有道理的,因为在此环境中原始数据中不存在该错误,但是我希望getOrElse语句能够捕获该异常。
org.apache.spark.sql.AnalysisException: No such struct field nameOption1 in...
在Scala Spark中是否有处理select语句异常的好方法?还是我需要为每种情况编写不同的功能?
答案 0 :(得分:1)
entry: sourceEntryFile,
mode: 'production', // for webpack 4
target: 'node',
output: {
filename: '[name].js',
path: outputPathFolder,
libraryTarget: 'commonjs',
},
resolve: {
extensions: ['.js', '.json'],
modules: ['node_modules']
},
node: {
__dirname: false,
},
externals: {
'aws-sdk': 'aws-sdk'
},
plugins: (() => {
const plugins = [
new webpack.DefinePlugin({
'global.GENTLY': false
})
];
// plugins.push(new WebpackBundleAnalyzer.BundleAnalyzerPlugin());
return plugins;
})()
答案 1 :(得分:0)
因此,一年后,我将重新讨论这个问题。我相信该解决方案实施起来会更加优雅。请让我知道其他人的想法:
// Generate a fake DataFrame
val df = Seq(
("1234", "A", "AAA"),
("1134", "B", "BBB"),
("2353", "C", "CCC")
).toDF("id", "name", "nameAlt")
// Extract the column names
val columns = df.columns
// Add a "new" column name that is NOT present in the above DataFrame
val columnsAdd = columns ++ Array("someNewColumn")
// Let's then "try" to select all of the columns
df.select(columnsAdd.flatMap(c => Try(df(c)).toOption): _*).show(false)
// Let's reduce the DF again...should yield the same results
val dfNew = df.select("id", "name")
dfNew.select(columnsAdd.flatMap(c => Try(dfNew(c)).toOption): _*).show(false)
// Results
columns: Array[String] = Array(id, name, nameAlt)
columnsAdd: Array[String] = Array(id, name, nameAlt, someNewColumn)
+----+----+-------+
|id |name|nameAlt|
+----+----+-------+
|1234|A |AAA |
|1134|B |BBB |
|2353|C |CCC |
+----+----+-------+
dfNew: org.apache.spark.sql.DataFrame = [id: string, name: string]
+----+----+
|id |name|
+----+----+
|1234|A |
|1134|B |
|2353|C |
+----+----+