我有一个JSON数据文件,我想以编程方式将一个模式应用于列。
pets.json
{"id":"311","species":"canine","color":"golden","weight":"75","name":"Captain"}
{"id":"928","species":"feline","color":"gray","weight":"8","name":"Oscar"}
SparkSession session = SparkSession.builder().appName("SparkSQLTests").master("local[*]").getOrCreate();
DataFrameReader dataFrameReader = session.read();
// Create Data Frame
Dataset<Row> pets = dataFrameReader.schema(buildSchema()).json("input/pets.json");
// Schema
pets.printSchema();
pets.show(10);
// SELECT *
// FROM pets
// WHERE species='canine'
System.out.println("=== Display Canines ===");
pets.filter(col("species").equalTo("canine")).show();
session.stop();
当我运行程序时,我的列为空。我做错了什么? 感谢
root |-- id: integer (nullable = true) |-- species: string (nullable = true) |-- color: string (nullable = true) |-- weight: double (nullable = true) |-- name: string (nullable = true) +----+-------+-----+------+----+ | id|species|color|weight|name| +----+-------+-----+------+----+ |null| null| null| null|null| |null| null| null| null|null| +----+-------+-----+------+----+ === Display Canines === +---+-------+-----+------+----+ | id|species|color|weight|name| +---+-------+-----+------+----+ +---+-------+-----+------+----+
答案 0 :(得分:0)
事实证明,我在json数据中的数值周围引用了引号。它在我将数据更改为:
时有效{&#34; ID&#34; 311&#34;物种&#34;:&#34;犬&#34;&#34;颜色&#34;:&#34;金色&#34; &#34;重量&#34;:75,&#34;名称&#34;:&#34;船长&#34;} {&#34; ID&#34;:928,&#34;物种&#34;:&#34;猫&#34;&#34;颜色&#34;:&#34;灰&#34;&# 34;重量&#34;:8,&#34;名称&#34;:&#34;奥斯卡&#34;}