我有一些这样格式的json文件:
{"_t":1480647647,"_p":"rattenbt@test.com","_n":"app_loaded","device_type":"desktop"}
{"_t":1480647676,"_p":"rattenbt@test.com","_n":"app_loaded","device_type":"desktop"}
{"_t":1483161958,"_p":"rattenbt@test.com","_n":"app_loaded","device_type":"desktop"}
{"_t":1483162393,"_p":"rattenbt@test.com","_n":"app_loaded","device_type":"desktop"}
{"_t":1483499947,"_p":"rattenbt@test.com","_n":"app_loaded","device_type":"desktop"}
{"_t":1505361824,"_p":"pfitza@test.com","_n":"added_to_team","account":"1234"}
{"_t":1505362047,"_p":"konit@test.com","_n":"added_to_team","account":"1234"}
{"_t":1505362372,"_p":"oechslin@test.com","_n":"added_to_team","account":"1234"}
{"_t":1505362854,"_p":"corrada@test.com","_n":"added_to_team","account":"1234"}
{"_t":1505366071,"_p":"vertigo@test.com","_n":"added_to_team","account":"1234"}
我在我的java应用程序中使用Apache Spark来读取这个json文件并保存为镶木地板格式。
如果我没有使用模式定义,那么文件解析没有问题 有我的代码示例:
Dataset<Row> dataset = spark.read().json(pathToFile);
dataset.show(100);
还有我的控制台输出:
+-------------+------------------+----------+-------+-------+-----------+
| _n| _p| _t|account|channel|device_type|
+-------------+------------------+----------+-------+-------+-----------+
| app_loaded| rattenbt@test.com|1480647647| null| null| desktop|
| app_loaded| rattenbt@test.com|1480647676| null| null| desktop|
| app_loaded| rattenbt@test.com|1483161958| null| null| desktop|
| app_loaded| rattenbt@test.com|1483162393| null| null| desktop|
| app_loaded| rattenbt@test.com|1483499947| null| null| desktop|
|added_to_team| pfitza@test.com|1505361824| 1234| null| null|
|added_to_team| konit@test.com|1505362047| 1234| null| null|
...
当我使用这样的模式定义时
StructType schema = new StructType();
schema.add("_n", StringType, true);
schema.add("_p", StringType, true);
schema.add("_t", TimestampType, true);
schema.add("account", StringType, true);
schema.add("channel", StringType, true);
schema.add("device_type", StringType, true);
// Read data from file
Dataset<Row> dataset = spark.read().schema(schema).json(pathToFile);
dataset.show(100);
我得到了控制台输出:
++
||
++
||
||
||
||
...
schma定义有什么问题?
答案 0 :(得分:2)
schema.printTreeString
是不可变的,所以只需丢弃所有添加内容。如果你打印它
root
你会看到它不包含任何字段:
StructType schema = new StructType()
.add("_n", StringType, true)
.add("_p", StringType, true)
.add("_t", TimestampType, true)
.add("account", StringType, true)
.add("channel", StringType, true)
.add("device_type", StringType, true);
您应该使用:
</span>