Scala Databricks:将所有bigint列强制转换为双精度

时间:2019-06-28 01:12:58

标签: scala apache-spark types

我提到的是这个问题:Cast multiples columns in a DataFrame

我有一个包含许多列的数据框。不应触摸某些开头列(例如5),因为它们是ID,名称等。

从第6列开始,如果列的数据类型为[ { "id": "id1", "amount": "1", "accounts": [ { "role": "role1", "account_id": "account_id1", "region": "regionA", "amount": "11" }, { "role": "role2", "account_id": "account_id2", "region": "regionB", "amount": "12" } ] }, { "id": "id2", "amount": "1", "accounts": [ { "role": "role3", "account_id": "account_id1", "region": "regionA", "amount": "13" }, { "role": "role4", "account_id": "account_id3", "region": "region3", "amount": "14" } ] } ] bigint数据类型,我想投射一列。

当前,我正在使用:

double

对于每列,这确实很耗时。

1 个答案:

答案 0 :(得分:1)

1-排除前5列,然后查找所有BigInt / Long类型的列

2-折叠BigInt列的列表,将其更改为Double

val df2 = df.schema.drop(5).collect{case c if c.dataType == DataTypes.LongType => c.name}.foldLeft(df){(acc, nxt) => acc.withColumn(nxt, acc.col(nxt).cast(DataTypes.DoubleType))}