Question

我问是因为在从实木复合地板文件加载BigQuery表时遇到错误，使我认为它错误地读取了某些字段的模式。

我正在尝试从cloudShell将拼花地板文件从拼花地板加载到bigQuery：

loc1=gs://our-data/thisTable/model=firstmodel

bq --location=US load --noreplace --source_format=PARQUET our-data:theSchema.theTable $loc1/*.parquet ./ourSchema.json

在loc1引用的目录中有约30个实木复合地板文件。我收到指向这些特定文件之一的错误：

    BigQuery error in load operation: Error processing job 'our-data:bqjob_re73397ea395b9fd_0000016ae66ab746_1': Error while reading
data, error message: Provided schema is not compatible with the file 'part-00000-20b9e343-460b-44a8-b083-4437284d6771.c000.snappy.parquet'.
Field 'dataend' is specified as NULLABLE in provided schema which does not match REQUIRED as specified in the file.

但是，当我通过spark访问镶木地板文件并运行printSchema（）时，该字段显示为NULLABLE：

root |-row_id：long（可为空= true）|-row_name：字符串（nullable = true）|-dataend：字符串（nullable = true）

，BigQuery表上的架构为NULLABLE，架构JSON的相应部分也是如此：

感谢您对下一步的了解。

Answer 1

当出于兼容性原因，Spark SQL编写Parquet文件时，它会将所有列automatically converts都设为NULLABLE。

您可以使用parquet-tools检查镶木地板文件本身，以仔细检查原始文件中是否设置了REQUIRED。

BigQuery如何读取Google Cloud Storage中的实木复合地板文件的架构？

1 个答案: