Question

我正在尝试加载文件并将其作为木地板文件写入到我的HDFS路径中。但是，每当我运行下面的代码时，都不会插入任何值。

下面是我的代码：

(CatUpdate & {kind: "CatUpdate"}) | (DogUpdate & {kind: "DogUpdate"})

当我从语句中进行选择时，它显示以下内容：

关于为什么发生这种情况的任何想法吗？

Answer 1

您是否尝试将数据加载到该目录hdfs://hadoop_data/path/mx_test/的 (as table pointed to this directory) 中，然后检查是否能够在Hive表中看到数据。

df.write.save('hdfs://hadoop_data/path/mx_test/', format="parquet")

`UPDATE:`

请检查 parquet file vs hive table 列名称中的列名。

如果列名不同，则它们的 hive parquet table displays null 值。

How to check column names, types in parquet file?

Use parquet-tools to check the schema for the parquet file:

bash$ parquet-tools meta hdfs://<namenode_address:8020><hdfs_path_to_parquet_file>

（或）

Copy parquet file to local the check the schema:

bash$ parquet-tools meta <local_path_to_parquet_file>

现在创建与实木复合地板文件匹配的hive table schema，然后检查是否能够获取数据而不是NULL。

`How to check column names, types in parquet file?`