我有一个简单的火花作业,它将文件中的单词拆分并加载到hive中的表中。
public static void wordCountJava7() {
// Define a configuration to use to interact with Spark
SparkConf conf = new SparkConf().setMaster("local[4]").setAppName("Work Count App");
SparkContext sc = new SparkContext(conf);
// Create a Java version of the Spark Context from the configuration
JavaSparkContext jsc = new JavaSparkContext(sc);
// Load the input data, which is a text file read from the command line
JavaRDD<String> input = jsc.textFile("file:///home/priyanka/workspace/ZTA/spark/src/main/java/sample.txt");
// Java 7 and earlier
JavaRDD<String> words = input.flatMap(new FlatMapFunction<String, String>() {
public Iterable<String> call(String s) {
return Arrays.asList(s.split(" "));
}
});
// Java 7 and earlier: transform the collection of words into pairs (word and 1)
JavaPairRDD<String, Integer> counts = words.mapToPair(new PairFunction<String, String, Integer>() {
public Tuple2<String, Integer> call(String s) {
return new Tuple2(s, 1);
}
});
// Java 7 and earlier: count the words
JavaPairRDD<String, Integer> reducedCounts = counts.reduceByKey(new Function2<Integer, Integer, Integer>() {
public Integer call(Integer x, Integer y) {
return x + y;
}
});
HiveContext hiveContext = new HiveContext(sc);
DataFrame dataFrame = hiveContext.createDataFrame(words, SampleBean.class);
dataFrame.write().saveAsTable("Sample");
words.saveAsTextFile("output");
jsc.close();
}
火花作业因以下追踪而失败:
16/04/29 15:41:21 WARN HiveContext$$anon$2: Could not persist `sample` in a Hive compatible way. Persisting it into Hive metastore in Spark SQL specific format.
org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: at least one column must be specified for the table
at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:720)
at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:677)
at org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$createTable$1.apply$mcV$sp(ClientWrapper.scala:424)
at org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$createTable$1.apply(ClientWrapper.scala:422)
at org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$createTable$1.apply(ClientWrapper.scala:422)
at org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$withHiveState$1.apply(ClientWrapper.scala:290)
at org.apache.spark.sql.hive.client.ClientWrapper.liftedTree1$1(ClientWrapper.scala:237)
at org.apache.spark.sql.hive.client.ClientWrapper.retryLocked(ClientWrapper.scala:236)
at org.apache.spark.sql.hive.client.ClientWrapper.withHiveState(ClientWrapper.scala:279)
at org.apache.spark.sql.hive.client.ClientWrapper.createTable(ClientWrapper.scala:422)
at org.apache.spark.sql.hive.HiveMetastoreCatalog.createDataSourceTable(HiveMetastoreCatalog.scala:358)
at org.apache.spark.sql.hive.execution.CreateMetastoreDataSourceAsSelect.run(commands.scala:280)
at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:58)
at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:56)
检查Hive中的表格示例。它确实有一栏
hive> desc sample;
OK
word string None
Time taken: 0.218 seconds, Fetched: 1 row(s)
当我尝试将其另存为表时,会抛出此错误。 任何帮助表示赞赏。
答案 0 :(得分:0)
这意味着列数据类型不正确 我在使用Avro模式时遇到了相同的错误
数据帧中的数据类型为十进制(20,2) 在Avro模式中,我提到的类型为Decimal(20,2) 它给出了相同的错误
后来将Avro模式中的数据类型更改为字符串,并且对我来说效果很好 作为Avro将内部小数转换为字符串 changed schema