Bigquery CSV加载使用Python API失败,但使用GUI成功

时间:2019-01-16 03:39:42

标签: python-2.7 google-bigquery

我正在使用Python创建CSV文件并将其加载到BigQuery中。但是,它无法加载并显示以下错误:

  

读取数据时出错,错误消息:CSV表引用列   位置4,但从位置4137开始的行仅包含2   列。

我正在使用的配置如下:

job_config = bigquery.LoadJobConfig()
job_config.source_format = bigquery.SourceFormat.CSV
job_config.ignore_unknown_values = False
job_config.autodetect = False
job_config.field_delimiter = '|'
job_config.write_disposition = bigquery.WriteDisposition.WRITE_TRUNCATE
job_config.max_bad_records = 0
job_config.skip_leading_rows = 0

样本记录如下:

2c |黑鹰发狂|那是正确的|好时光和伟大的国家; 2019-01-16 14:22:07

CSV共有114条记录。当我设置job_config.allow_quoted_newlines = True时。它仅加载60行或少于114的行数。

按以下方式创建CSV文件:

f.write((str(callsign).split('_')[0]).lower().encode('utf-8') + '|' + artist.encode('utf-8') + '|' + song_title.encode('utf-8') + '|' + show_title.encode('utf-8') + '|' + str(time_bq).encode('utf-8') + '\n')

def bq_load():
credential_path = "Key.json"
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = credential_path

bigquery_client = bigquery.Client()
dataset_ref = bigquery_client.dataset('POC')
table_ref = dataset_ref.table('data_poc')
job_config = bigquery.LoadJobConfig()
job_config.source_format = bigquery.SourceFormat.CSV
job_config.ignore_unknown_values = False
job_config.autodetect = False
job_config.field_delimiter = '|'
job_config.allow_quoted_newlines = True
job_config.allow_jagged_rows = True
job_config.write_disposition = bigquery.WriteDisposition.WRITE_TRUNCATE
job_config.max_bad_records = 0
job_config.skip_leading_rows = 0

with open('current_song.csv', 'rb') as source_file:
    job = bigquery_client.load_table_from_file(
        source_file,
        table_ref,
        location='US',  # Must match the destination dataset location.
        job_config=job_config)  # API request

try:
    print job.job_id
    job.result()  # Waits for table load to complete.
    return 'Success'
except Exception as ex:
    logger.error(
        "The following exception occurred in the table load of job - {} {} ".format(ex, format(job.job_id)))
    return 'Failure'

需要将CSV的全部内容加载到bigquery中。找出此错误背后原因的任何帮助将非常有帮助

1 个答案:

答案 0 :(得分:0)

您在BigQuery和Python示例之间的输入中使用了不同的定界符。

注意:您的示例正在使用管道定界符,并且在字段中包含分号。

2c|Blackhawk gone wild|That's Just About Right|Good Times & Great Country;2019-01-16 14:22:07

您的Python代码使用分号分隔符:

job_config.field_delimiter = ';'

您的Bigquery使用的是管道定界符:

job_config.field_delimiter = '|'

根据您的BigQuery示例,更改Python代码以使其匹配:

job_config.field_delimiter = '|'

注意:我不知道您为什么要包括第一行代码:

f.write((str(callsign).split('_')[0]).lower().encode('utf-8') + ';' + artist.encode('utf-8') + ';' + song_title.encode('utf-8') + ';' + show_title.encode('utf-8') + ';' + str(time_bq).encode('utf-8') + '\n')