根据Spark / Scala中的动态值过滤数据帧

时间:2016-11-17 07:56:49

标签: scala apache-spark spark-dataframe

我有一个以下格式的json:

{"Request": {"TrancheList": {"Tranche": [{"TrancheId": "500192163","OwnedAmt": "26500000",    "Curr": "USD" }, {  "TrancheId": "500213369", "OwnedAmt": "41000000","Curr": "USD"}]},"FxRatesList": {"FxRatesContract": [{"Currency": "CHF","FxRate": "0.97919983706115"},{"Currency": "AUD", "FxRate": "1.2966804979253"},{ "Currency": "USD","FxRate": "1"},{"Currency": "SEK","FxRate": "8.1561012531034"},{"Currency": "NOK", "FxRate": "8.2454981641398"},{"Currency": "JPY","FxRate": "111.79999785344"},{"Currency": "HKD","FxRate": "7.7568025218916"},{"Currency": "GBP","FxRate": "0.69425159677867"}, {"Currency": "EUR","FxRate": "0.88991723769689"},{"Currency": "DKK", "FxRate": "6.629598372301"}]},"isExcludeDeals": "true","baseCurrency": "USD"}}

我正在尝试获取等于baseCurrency标记

的Currency的Fxrate值

我正在从hdfs集群中读取json

 val hdfsRequest = spark.read.json("localhost/user/request.json")
val baseCurrency = hdfsRequest.select("Request.baseCurrency")
var fxRates = hdfsRequest.select("Request.FxRatesList.FxRatesContract")
val fxRatesDF = fxRates.select(explode(fxRates("FxRatesContract"))).toDF("FxRatesContract").select("FxRatesContract.Currency", "FxRatesContract.FxRate").filter($"Currency=baseCurrency")

我运行这行代码的错误是:

org.apache.spark.sql.AnalysisException: cannot resolve '`Currency=baseCurrency`' given input columns: [Currency, FxRate];

如何在Scala / Spark的dataframe过滤器表达式中指定可变baseCurrency?

由于

1 个答案:

答案 0 :(得分:4)

如果基础货币只是一个值,那么您可以做的是:

val hdfsRequest = spark.read.json("localhost/user/request.json")
val baseCurrency = hdfsRequest.select("Request.baseCurrency")
  .map(_.getString(0)).collect.headOption
var fxRates = hdfsRequest.select("Request.FxRatesList.FxRatesContract")
val fxRatesDF = fxRates.select(explode(fxRates("FxRatesContract")))
  .toDF("FxRatesContract")
  .select("FxRatesContract.Currency", "FxRatesContract.FxRate")
  .filter($"Currency"===baseCurrency.fold(-1D)(identity))