Question

我试图完成一个简单的事情＆＃34;将数据框写入Hive表＆＃34;，下面是用Java编写的代码。我没有使用Cloudera VM。

 public static void main(String[] args) {
    String master = "local[*]";

    SparkSession sparkSession = SparkSession
            .builder().appName(JsonToHive.class.getName())
            //.config("spark.sql.warehouse.dir", "hdfs://localhost:50070/user/hive/warehouse/")
            .enableHiveSupport().master(master).getOrCreate();

    SparkContext context = sparkSession.sparkContext();
    context.setLogLevel("ERROR");

    SQLContext sqlCtx = sparkSession.sqlContext();
    Dataset<Row> rowDataset = sqlCtx.jsonFile("employees.json");
    rowDataset.printSchema();
    rowDataset.registerTempTable("employeesData");

    Dataset<Row> firstRow = sqlCtx.sql("select employee.firstName, employee.addresses from employeesData");
    firstRow.show();

    sparkSession.catalog().listTables().select("*").show();

    firstRow.write().mode() saveAsTable("default.employee");
    sparkSession.close();

}

我使用HQL在HIVE中创建了托管表，

 CREATE TABLE employee ( firstName STRING, lastName STRING, addresses  ARRAY < STRUCT < street:STRING,  city:STRING, state:STRING > > )  STORED AS PARQUET;

我正在阅读来自＆＃34; employees.json＆＃34;

的数据的简单JSON文件

{"employee":{"firstName":"Neil","lastName":"Irani","addresses":[{"street":"36th","city":"NYC","state":"Ny"},{"street":"37th","city":"NYC","state":"Ny"},{"street":"38th","city":"NYC","state":"Ny"}]}}

它表示＆＃34;表default。employee已经存在。＆＃34;并且它不附加内容。如何将内容附加到hive表??

如果我设置模式（＆＃34;追加＆＃34;），它不会抱怨，但它也不会写内容..

firstRow.write（）。mode（＆＃34; append＆＃34;）saveAsTable（＆＃34; default.employee＆＃34;）;

任何帮助将不胜感激...谢谢。

+-------------+--------+-----------+---------+-----------+
|         name|database|description|tableType|isTemporary|
+-------------+--------+-----------+---------+-----------+
|     employee| default|       null|  MANAGED|      false|
|employeesdata|    null|       null|TEMPORARY|       true|
+-------------+--------+-----------+---------+-----------+

更新

/usr/lib/hive/conf/hive-site.xml不在类路径中，所以它没有读取表，在类路径中添加后它工作正常...因为我从IntelliJ运行我有这个问题..在生产中，spark-conf文件夹将链接到hive-site.xml ...

Answer 1

看起来你应该insertInto(String tableName)而不是saveAsTable(String tableName)。

firstRow.write().mode("append").insertInto("default.employee");

在Apache Spark中用Java编写一个Dataframe到Hive表

1 个答案: