Question

我有一个简单的火花作业，它将文件中的单词拆分并加载到hive中的表中。

  public static void wordCountJava7() {
    // Define a configuration to use to interact with Spark
    SparkConf conf = new SparkConf().setMaster("local[4]").setAppName("Work Count App");
    SparkContext sc = new SparkContext(conf);
    // Create a Java version of the Spark Context from the configuration
    JavaSparkContext jsc = new JavaSparkContext(sc);

    // Load the input data, which is a text file read from the command line
    JavaRDD<String> input = jsc.textFile("file:///home/priyanka/workspace/ZTA/spark/src/main/java/sample.txt");

    // Java 7 and earlier
    JavaRDD<String> words = input.flatMap(new FlatMapFunction<String, String>() {
      public Iterable<String> call(String s) {
        return Arrays.asList(s.split(" "));
      }
    });

    // Java 7 and earlier: transform the collection of words into pairs (word and 1)
    JavaPairRDD<String, Integer> counts = words.mapToPair(new PairFunction<String, String, Integer>() {
      public Tuple2<String, Integer> call(String s) {
        return new Tuple2(s, 1);
      }
    });

    // Java 7 and earlier: count the words
    JavaPairRDD<String, Integer> reducedCounts = counts.reduceByKey(new Function2<Integer, Integer, Integer>() {
      public Integer call(Integer x, Integer y) {
        return x + y;
      }
    });

    HiveContext hiveContext = new HiveContext(sc);
    DataFrame dataFrame = hiveContext.createDataFrame(words, SampleBean.class);
    dataFrame.write().saveAsTable("Sample");

    words.saveAsTextFile("output");
    jsc.close();
  }

火花作业因以下追踪而失败：

  16/04/29 15:41:21 WARN HiveContext$$anon$2: Could not persist `sample` in a Hive compatible way. Persisting it into Hive metastore in Spark SQL specific format.
org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: at least one column must be specified for the table
    at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:720)
    at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:677)
    at org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$createTable$1.apply$mcV$sp(ClientWrapper.scala:424)
    at org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$createTable$1.apply(ClientWrapper.scala:422)
    at org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$createTable$1.apply(ClientWrapper.scala:422)
    at org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$withHiveState$1.apply(ClientWrapper.scala:290)
    at org.apache.spark.sql.hive.client.ClientWrapper.liftedTree1$1(ClientWrapper.scala:237)
    at org.apache.spark.sql.hive.client.ClientWrapper.retryLocked(ClientWrapper.scala:236)
    at org.apache.spark.sql.hive.client.ClientWrapper.withHiveState(ClientWrapper.scala:279)
    at org.apache.spark.sql.hive.client.ClientWrapper.createTable(ClientWrapper.scala:422)
    at org.apache.spark.sql.hive.HiveMetastoreCatalog.createDataSourceTable(HiveMetastoreCatalog.scala:358)
    at org.apache.spark.sql.hive.execution.CreateMetastoreDataSourceAsSelect.run(commands.scala:280)
    at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:58)
    at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:56)

检查Hive中的表格示例。它确实有一栏

hive> desc sample;
OK
word                    string                  None                
Time taken: 0.218 seconds, Fetched: 1 row(s)

当我尝试将其另存为表时，会抛出此错误。任何帮助表示赞赏。

Answer 1

这意味着列数据类型不正确我在使用Avro模式时遇到了相同的错误

数据帧中的

数据类型为十进制（20,2）在Avro模式中，我提到的类型为Decimal（20,2）它给出了相同的错误

schema with issue

后来将Avro模式中的数据类型更改为字符串，并且对我来说效果很好作为Avro将内部小数转换为字符串 changed schema

在Hive中另存为表：必须为表“指定至少一列的失败”

1 个答案: