Question

我可以阅读Json和printSchema，但是运行任何操作都失败了（在作业中没有指定输入路径）。

val sc = new org.apache.spark.SparkContext("local[*]", "shell")
val sqlCtx = new SQLContext(sc)
val input = sqlCtx.jsonFile("../data/tweets/")
input.printSchema

根
  | - contributorsIDs：array（nullable = true）
  | | - element：string（containsNull = true）
  | - createdAt：string（nullable = true）
  ...

input.first
java.io.IOException: No input paths specified in job

文件夹结构如下：

鸣叫
- tweets_1444576960000
  - _SUCCESS
  - 部分-00000
- tweets_1444577070000
  - _SUCCESS
  - part-00000

注意：

我正在使用Spark和Spark SQL版本1.5.0
执行者在同一台机器上local[*]
我尝试用绝对路径替换文件路径。同样的错误
使用databrick's example app here

Answer 1

好的，通过指定类似

的路径解决了问题

val input = sqlCtx.jsonFile("../data/tweets/tweets_*/*")

Spark SQL“在作业中没有指定输入路径”，但可以printSchema

1 个答案: