'缺少近似双引号(")字符'在将数据加载到BigQuery时,在csv文件中有换行符时会抱怨

时间:2015-11-13 14:09:24

标签: google-bigquery

罪魁祸首如下。它应该由14列组成,其中一列是从' Hi I' m Niger ...'开头,用换行符覆盖多行。

17935,9a7105ee-30c8-4a6d-9374-10875b7d6288.jpg,"""top""=>""0"", ""left""=>""0"", ""width""=>""180"", ""height""=>""180""",,"",2015-07-26 19:33:57.292058,2015-07-26 20:25:30.068887,fe43876f-1b2c-464a-aa20-bf335ed3ff62,c68c8c70-bc2b-11e4-90a1-22000b21105f,{},2e790350-15fb-0133-2cb8-22000ba51078,"Hi I'm Nigerian so wish to study in sweden.
so I'm Undergraduate student I want study Engineering. 
Thanks.","",{}

通过命令bq load --replace --source_format=CSV -F"," ...将此csv数据加载到BigQuery时,错误会抱怨。任何人都可以给我一个BigQuery Load Data命令的解决方案吗?

- File: 0 / Line:17192 / Field:12: Missing close double quote (")
character: field starts with: <Hi I'm N>
- File: 0 / Line:17193: Too few columns: expected 14 column(s) but
got 1 column(s). For additional help: http://goo.gl/RWuPQ
- File: 0 / Line:17194: Too few columns: expected 14 column(s) but
got 3 column(s). For additional help: http://goo.gl/RWuPQ

2 个答案:

答案 0 :(得分:4)

如果您要加载包含嵌入换行符的CSV,则需要指定allowQuotedNewlines

https://cloud.google.com/bigquery/docs/reference/v2/jobs#configuration.load.allowQuotedNewlines

BigQuery默认假设CSV数据不包含换行符。由于输入文件可以在任意换行符处拆分,因此在处理大型数据文件时,这允许更高的解析吞吐量。如果您的数据在字符串中包含换行符,则每个文件都需要由一台机器进行线性解析。

答案 1 :(得分:2)

确保在将数据加载到 BigQuery 之前包含此行:'job_config.allow_quoted_newlines = True'

job_config = bigquery.LoadJobConfig()
job_config.allow_quoted_newlines = True