我提到的是这个问题:Cast multiples columns in a DataFrame
我有一个包含许多列的数据框。不应触摸某些开头列(例如5),因为它们是ID,名称等。
从第6列开始,如果列的数据类型为[
{
"id": "id1",
"amount": "1",
"accounts": [
{
"role": "role1",
"account_id": "account_id1",
"region": "regionA",
"amount": "11"
},
{
"role": "role2",
"account_id": "account_id2",
"region": "regionB",
"amount": "12"
}
]
},
{
"id": "id2",
"amount": "1",
"accounts": [
{
"role": "role3",
"account_id": "account_id1",
"region": "regionA",
"amount": "13"
},
{
"role": "role4",
"account_id": "account_id3",
"region": "region3",
"amount": "14"
}
]
}
]
到bigint
数据类型,我想投射一列。
当前,我正在使用:
double
对于每列,这确实很耗时。
答案 0 :(得分:1)
1-排除前5列,然后查找所有BigInt / Long类型的列
2-折叠BigInt列的列表,将其更改为Double
val df2 = df.schema.drop(5).collect{case c if c.dataType == DataTypes.LongType => c.name}.foldLeft(df){(acc, nxt) => acc.withColumn(nxt, acc.col(nxt).cast(DataTypes.DoubleType))}