Question

我将一些文件上传到Google云端存储（csv和json）。

我可以创建BigQuery表，无论是本机还是外部，都链接到Google云端存储中的这些文件。

在创建bigquery表的过程中，我可以检查＆＃34; Schema自动检测＆＃34;。

＆＃34; Schema自动检测＆＃34;适用于json新行分隔格式文件。但是对于csv文件，第一行是＆＃39;列名＆＃34;，bigquery不能执行＆＃34; schema自动检测＆＃34;，它将第一行视为数据，然后创建的模式bigquery将是string_field_1，string_field_2等。

我的csv文件是否需要做任何事情才能生成bigquery＆＃34; Schema自动检测＆＃34;作品？

我拥有的csv文件是＆＃34; Microsoft Excel逗号分隔值文件＆＃34;。

更新：

如果第一列为空，则BigQuery自动检测不会检测标题

custom id,asset id,related isrc,iswc,title,hfa song code,writers,match policy,publisher name,sync ownership share,sync ownership territory,sync ownership restriction
,A123,,,Medley of very old Viennese songs,,,,,,,
,A234,,,Suite de pièces No. 3 en Ré Mineur  HWV 428 - Allemande,,,,,,,

但是如果第一列不是空的 - 那就没问题了：

custom id,asset id,related isrc,iswc,title,hfa song code,writers,match policy,publisher name,sync ownership share,sync ownership territory,sync ownership restriction
1,A123,,,Medley of very old Viennese songs,,,,,,,
2,A234,,,Suite de pièces No. 3 en Ré Mineur  HWV 428 - Allemande,,,,,,,

它应该是BigQuery的功能改进请求吗？

Answer 1

CSV自动检测会检测CSV文件中的标题行，因此您的数据必须有一些特殊内容。如果您能提供真实的数据片段和您使用的实际命令，那将是一件好事。以下是演示其工作原理的示例：

~$ cat > /tmp/people.csv
Id,Name,DOB
1,Bill Gates,1955-10-28
2,Larry Page,1973-03-26
3,Mark Zuckerberg,1984-05-14
~$ bq load --source_format=CSV --autodetect dataset.people /tmp/people.csv
Upload complete.
Waiting on bqjob_r33dc9ca5653c4312_0000015af95f6209_1 ... (2s) Current status: DONE   
~$ bq show dataset.people
Table project:dataset.people

   Last modified        Schema        Total Rows   Total Bytes   Expiration   Labels  
 ----------------- ----------------- ------------ ------------- ------------ -------- 
  22 Mar 21:14:27   |- Id: integer    3            89                                 
                    |- Name: string                                                   
                    |- DOB: date

Answer 2

custom id,asset id,related isrc,iswc,title,hfa song code,writers,match policy,publisher name,sync ownership share,sync ownership territory,sync ownership restriction
,A123,,,Medley of very old Viennese songs,,,,,,,
,A234,,,Suite de pièces No. 3 en Ré Mineur  HWV 428 - Allemande,,,,,,,

如果第一列为空，则Google BigQuery无法检测到架构。

custom id,asset id,related isrc,iswc,title,hfa song code,writers,match policy,publisher name,sync ownership share,sync ownership territory,sync ownership restriction
1,A123,,,Medley of very old Viennese songs,,,,,,,
2,A234,,,Suite de pièces No. 3 en Ré Mineur  HWV 428 - Allemande,,,,,,,

如果我将值添加到第一列，那么Google BigQuery可以检测架构。

它应该是BigQuery的功能改进请求吗？

Bigquery创建表（本机或外部）链接到Google云存储

2 个答案: