我有一个简单的JSON,如下所示,值节点有时会有STRING,有时会有DOUBLE。我希望将值视为STRING。但是当火花看到标签是双倍的时候,转换成不同格式的E
输入JSON
{"key" : "k1", "value": "86093351508521808.0"}
{"key" : "k2", "value": 86093351508521808.0}
Spark输出CSV
k1,86093351508521808.0
k2,8.6093351508521808E16
预期输出
k1,86093351508521808.0
k2,86093351508521808.0
请告知如何实现所需的输出。我们从未读过标签中的值,因此我们永远不会知道精度和其他细节。
以下是示例代码
public static void main(String[] args) {
SparkSession sparkSession = SparkSession
.builder()
.appName(TestSpark.class.getName())
.master("local[*]").getOrCreate();
SparkContext context = sparkSession.sparkContext();
context.setLogLevel("ERROR");
SQLContext sqlCtx = sparkSession.sqlContext();
System.out.println("Spark context established");
List<StructField> kvFields = new ArrayList<>();
kvFields.add(DataTypes.createStructField("key", DataTypes.StringType, true));
kvFields.add(DataTypes.createStructField("value", DataTypes.StringType, true));
StructType employeeSchema = DataTypes.createStructType(kvFields);
Dataset<Row> dataset = sparkSession.read()
.option("inferSchema", false)
.format("json")
.schema(employeeSchema)
.load("D:\\dev\\workspace\\java\\simple-kafka\\key_value.json");
dataset.createOrReplaceTempView("sourceView");
sqlCtx.sql("select * from sourceView ")
.write()
.format("csv")
.save("D:\\dev\\workspace\\java\\simple-kafka\\output\\" + UUID.randomUUID().toString());
sparkSession.close();
}
答案 0 :(得分:1)
我们可以将该列转换为DecimalType,如下所示:
scala> import org.apache.spark.sql.types.DecimalType;
import org.apache.spark.sql.types.DecimalType
scala> spark.read.json(sc.parallelize(Seq("""{"key" : "k1", "value": "86093351508521808.0"}""","""{"key" : "k2", "value": 86093351508521808.0}"""))).select(col("value").cast(DecimalType(28, 1))).show
+-------------------+
| value|
+-------------------+
|86093351508521808.0|
|86093351508521808.0|
+-------------------+