在java中以编程方式定义模式返回空值?

时间:2018-04-11 10:50:05

标签: apache-spark

我有一个JSON数据文件,我想以编程方式将一个模式应用于列。

pets.json

{"id":"311","species":"canine","color":"golden","weight":"75","name":"Captain"}
{"id":"928","species":"feline","color":"gray","weight":"8","name":"Oscar"}


SparkSession session = SparkSession.builder().appName("SparkSQLTests").master("local[*]").getOrCreate();
        DataFrameReader dataFrameReader = session.read();

        // Create Data Frame
        Dataset<Row> pets = dataFrameReader.schema(buildSchema()).json("input/pets.json");

        // Schema
        pets.printSchema();
        pets.show(10);

        // SELECT * 
        // FROM pets
        // WHERE species='canine'
        System.out.println("=== Display Canines ===");
        pets.filter(col("species").equalTo("canine")).show();


        session.stop();

当我运行程序时,我的列为空。我做错了什么? 感谢


    root
     |-- id: integer (nullable = true)
     |-- species: string (nullable = true)
     |-- color: string (nullable = true)
     |-- weight: double (nullable = true)
     |-- name: string (nullable = true)

    +----+-------+-----+------+----+
    |  id|species|color|weight|name|
    +----+-------+-----+------+----+
    |null|   null| null|  null|null|
    |null|   null| null|  null|null|
    +----+-------+-----+------+----+

    === Display Canines ===
    +---+-------+-----+------+----+
    | id|species|color|weight|name|
    +---+-------+-----+------+----+
    +---+-------+-----+------+----+

1 个答案:

答案 0 :(得分:0)

事实证明,我在json数据中的数值周围引用了引号。它在我将数据更改为:

时有效

{&#34; ID&#34; 311&#34;物种&#34;:&#34;犬&#34;&#34;颜色&#34;:&#34;金色&#34; &#34;重量&#34;:75,&#34;名称&#34;:&#34;船长&#34;} {&#34; ID&#34;:928,&#34;物种&#34;:&#34;猫&#34;&#34;颜色&#34;:&#34;灰&#34;&# 34;重量&#34;:8,&#34;名称&#34;:&#34;奥斯卡&#34;}