从spark.sql.Row获取第一个值

时间:2016-11-17 17:56:26

标签: apache-spark apache-spark-sql

我有以下json格式:

{"Request": {"TrancheList": {"Tranche": [{"TrancheId": "500192163","OwnedAmt": "26500000",    "Curr": "USD" }, {  "TrancheId": "500213369", "OwnedAmt": "41000000","Curr": "USD"}]},"FxRatesList": {"FxRatesContract": [{"Currency": "CHF","FxRate": "0.97919983706115"},{"Currency": "AUD", "FxRate": "1.2966804979253"},{ "Currency": "USD","FxRate": "1"},{"Currency": "SEK","FxRate": "8.1561012531034"},{"Currency": "NOK", "FxRate": "8.2454981641398"},{"Currency": "JPY","FxRate": "111.79999785344"},{"Currency": "HKD","FxRate": "7.7568025218916"},{"Currency": "GBP","FxRate": "0.69425159677867"}, {"Currency": "EUR","FxRate": "0.88991723769689"},{"Currency": "DKK", "FxRate": "6.629598372301"}]},"isExcludeDeals": "true","baseCurrency": "USD"}}

从hdfs:

读取json
val hdfsRequest = spark.read.json("hdfs://localhost/user/request.json")
val baseCurrency = hdfsRequest.select("Request.baseCurrency").map(_.getString(0)).collect.headOption
var fxRates = hdfsRequest.select("Request.FxRatesList.FxRatesContract")
val fxRatesDF = fxRates.select(explode(fxRates("FxRatesContract"))).toDF("FxRatesContract").select("FxRatesContract.Currency", "FxRatesContract.FxRate").filter($"Currency"===baseCurrency.get)
fxRatesDF.show()

我为fxRatesDF获取的输出是:

fxRatesDF: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [Currency: string, FxRate: string]
+--------+------+
|Currency|FxRate|
+--------+------+
|     USD|     1|

如何获取Fxrate列第一行的值?

8 个答案:

答案 0 :(得分:16)

您可以使用

fxRatesDF.select(col("FxRate")).first.getString(0)

答案 1 :(得分:9)

Here是您需要使用的功能

像这样使用:

fxRatesDF.first().FxRate

答案 2 :(得分:1)

也许是这样:

fxRatesDF.take(1)[0][1]

fxRatesDF.collect()[0][1]

fxRatesDF.first()[1]

答案 3 :(得分:1)

我知道这是一篇旧帖子,但我让它以这种方式工作fxRatesDF.first()[0]

答案 4 :(得分:0)

您可以尝试以下方法:

fxRatesDF.select("FxRate").rdd.map{case Row(i:Int)=> i}.first()

答案 5 :(得分:0)

应该如此简单:

display(fxRatesDF.select($"FxRate").limit(1))

答案 6 :(得分:0)

只需一行就可以解决这个问题。

fxRates.first()(1)

一行有两个单词

fxRates.first().getString(1)

答案 7 :(得分:0)

一种简单的方法是使用索引选择行和列。 输入数据框:

+-----+
|count|
+-----+
|    0|
+-----+

代码:

count = df.collect()[0][0]
print(count)
if count == 0:
    print("First row and First column value is 0")

输出:

0
First row and First column value is 0