我有一列正在从字符串转换为双精度,但出现以下错误。
An error occurred while calling o2564.showString.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 619.0 failed 4 times, most recent failure: Lost task 0.3 in stage 619.0 org.apache.spark.SparkException: Failed to execute user defined function($anonfun$1: (double) => double)
train_with_summary.select('cd_val').show(10)
+-------------------+
| cd_val |
+-------------------+
| 1|
| 9|
| 9|
| 0|
| 1|
| 3|
| 3|
| 0|
| 1|
| 2|
+-------------------+
bucket_cols = ['cd_val']
for bucket_col in bucket_cols:
train_with_summary = train_with_summary.withColumn(bucket_col,train_with_summary[bucket_col].cast(DoubleType()))
bucketizer = Bucketizer(splits=[-float("inf"),4,9,14,19],inputCol=bucket_col,outputCol=bucket_col+"_buckets")
train_with_summary = bucketizer.setHandleInvalid("keep").transform(train_with_summary)
print(bucket_col)
print(train_with_summary.select([bucket_col,bucket_col+'_buckets']).show(10))
erorr在最后一行,列中没有Null值
答案 0 :(得分:0)
我自己弄明白了,错误是因为它试图将double类型本身转换为double类型。
由于我运行了两次代码,因此初始运行转换了该列。