如何通过csv将带引号的字符串数据正确导出到Big Query?

时间:2017-06-08 17:58:59

标签: mysql csv google-bigquery

尝试从MySQL导入CSV文件时遇到Big Query问题。已使用以下选项导出这些文件:

SELECT <some collunms> 
FROM <my table>
INTO OUTFILE 'export-20160411.csv'
CHARACTER SET 'utf8'
FIELDS TERMINATED BY ','
OPTIONALLY ENCLOSED BY '\"'
ESCAPED BY '\\';

问题是如何为BQ转义"(双引号),这是一个CSV示例:

field01, field02, field03, field04
"xxx \" xxx \\", \N, "xxx", "xxx"

导致问题。所以BQ给出了这样的错误:

BigQuery error in load operation: Error processing job
'<project>:bqjob_r7269aea2ac9eae3c_0000015c88a9049d_1': Too many
errors encountered.
Failure details:
- file-00000000: Too many values in row starting at position:
2052615.

和此:

- mediaupload-snapshot: Error detected while parsing row starting at
position: 561497. Error: Missing close double quote (") character.

最后,我的问题是:导出CSV的最佳方式是什么,以便BQ可以毫无问题地导入它?

提前致谢。

更新

格式:

field01, field02, field03, field04
"xxx "" xxx \\", \N, "xxx", "xxx"

在字符串中使用""而不是\"。但我不知道如何以这种方式从MySQL导出。

1 个答案:

答案 0 :(得分:1)

其中一种方法是加载原始csv文件,就好像它只有一列(整行只有一列)而不是 - 在BigQuery端进行解析

   

在下面的示例中 - 假设CSVtable是您使用该CSV文件加载的表,如下所示:

oneField     
"xxx "" xxx \\", \N, "xxx", "xxx"    

所以&#34;解析&#34;可以如下所示:

#standardSQL
WITH CSVtable AS (
  SELECT '''"xxx "" xxx \\\\", \\N, "xxx", "xxx"''' AS oneField
)
SELECT 
  SPLIT(oneField)[OFFSET(0)] AS field01,
  SPLIT(oneField)[OFFSET(1)] AS field02,
  SPLIT(oneField)[OFFSET(2)] AS field03,
  SPLIT(oneField)[OFFSET(3)] AS field04  
FROM CSVtable

此类查询的输出为

field01             field02     field03     field04  
"xxx "" xxx \\"     \N          "xxx"       "xxx"