Question

我提到的是这个问题：Cast multiples columns in a DataFrame

我有一个包含许多列的数据框。不应触摸某些开头列（例如5），因为它们是ID，名称等。

从第6列开始，如果列的数据类型为[ { "id": "id1", "amount": "1", "accounts": [ { "role": "role1", "account_id": "account_id1", "region": "regionA", "amount": "11" }, { "role": "role2", "account_id": "account_id2", "region": "regionB", "amount": "12" } ] }, { "id": "id2", "amount": "1", "accounts": [ { "role": "role3", "account_id": "account_id1", "region": "regionA", "amount": "13" }, { "role": "role4", "account_id": "account_id3", "region": "region3", "amount": "14" } ] } ]到bigint数据类型，我想投射一列。

当前，我正在使用：

double

对于每列，这确实很耗时。

Answer 1

1-排除前5列，然后查找所有BigInt / Long类型的列

2-折叠BigInt列的列表，将其更改为Double

val df2 = df.schema.drop(5).collect{case c if c.dataType == DataTypes.LongType => c.name}.foldLeft(df){(acc, nxt) => acc.withColumn(nxt, acc.col(nxt).cast(DataTypes.DoubleType))}

Scala Databricks：将所有bigint列强制转换为双精度

1 个答案: