Question

我正在运行Java Spark代码以读取一些json数据，并通过UDF将其中一个字段转换为大写在本地模式下运行时，代码工作正常，但是在集群中（在kubernetes下）运行时，我得到了这样的ClassCastException：

git ls-files | cut -f1 | uniq | grep '.xml'

UDF1 uppercase = new UdfUppercase() ; 
session.udf().register("uppercasefunction",uppercase , DataTypes.StringType) ; 

StructField[] structFields = new StructField[]{ 
        new StructField("intColumn", DataTypes.IntegerType, true, Metadata.empty()), 
        new StructField("stringColumn", DataTypes.StringType, true, Metadata.empty()) 
}; 
StructType structType = new StructType(structFields); 

List<String> jsonData = ImmutableList.of( 
        "{\"intColumn\":1,\"stringColumn\":\"Miami\"}"); 

Dataset<String> anotherPeopleDataset = session.createDataset(jsonData, Encoders.STRING()); 
Dataset<Row> anotherPeople = session.read().schema(structType).json(anotherPeopleDataset);            
anotherPeople.show(false); 

Dataset<Row> dfupercase = anotherPeople.select(callUDF("uppercasefunction", col("stringColumn"))); 
dfupercase.show(false);

任何帮助将不胜感激

Answer 1

问题已解决，这与与Spring Boot的一些jar冲突有关 classcastexception具有误导性，因为我们给我们的印象是，我们的数据帧无法很好地序列化（本地和群集之间的差异），而实际上它与代码istelf无关，但是jar冲突

Spark udf-带有json的classcastexception

1 个答案: