Spark Java:相同数据集的模式比较失败

时间:2018-10-22 15:44:27

标签: java apache-spark

我运行以下代码:

        List<StructField> fields = new ArrayList<>();
        fields.add(DataTypes.createStructField("A",DataTypes.LongType,true));
        fields.add(DataTypes.createStructField("B",DataTypes.DoubleType,true));
        StructType schema = DataTypes.createStructType(fields);
        Dataset<Row> df1 = spark.sql("select 1 as A, 2.2 as B");
        Dataset<Row> finalDf1 = spark.createDataFrame(df1.javaRDD(), schema);
        Dataset<Row> df2 = spark.sql("select 1 as A, 2.2 as B");
        Dataset<Row> finalDf2 = spark.createDataFrame(df2.javaRDD(), schema);
        finalDf1.printSchema();
        finalDf2.printSchema();
        System.out.println(finalDf1.schema());
        System.out.println(finalDf2.schema());
        System.out.println(finalDf1.schema()==finalDf2.schema());

我期望看到模式比较产生true,但不会-尽管模式是相同的。输出如下:

root
 |-- A: long (nullable = true)
 |-- B: double (nullable = true)

root
 |-- A: long (nullable = true)
 |-- B: double (nullable = true)

StructType(StructField(A,LongType,true), StructField(B,DoubleType,true))
StructType(StructField(A,LongType,true), StructField(B,DoubleType,true))
false

我想念什么?

0 个答案:

没有答案