我正在使用Python创建CSV文件并将其加载到BigQuery
中。但是,它无法加载并显示以下错误:
读取数据时出错,错误消息:CSV表引用列 位置4,但从位置4137开始的行仅包含2 列。
我正在使用的配置如下:
job_config = bigquery.LoadJobConfig()
job_config.source_format = bigquery.SourceFormat.CSV
job_config.ignore_unknown_values = False
job_config.autodetect = False
job_config.field_delimiter = '|'
job_config.write_disposition = bigquery.WriteDisposition.WRITE_TRUNCATE
job_config.max_bad_records = 0
job_config.skip_leading_rows = 0
样本记录如下:
2c |黑鹰发狂|那是正确的|好时光和伟大的国家; 2019-01-16 14:22:07
CSV共有114条记录。当我设置job_config.allow_quoted_newlines = True时。它仅加载60行或少于114的行数。
按以下方式创建CSV文件:
f.write((str(callsign).split('_')[0]).lower().encode('utf-8') + '|' + artist.encode('utf-8') + '|' + song_title.encode('utf-8') + '|' + show_title.encode('utf-8') + '|' + str(time_bq).encode('utf-8') + '\n')
def bq_load():
credential_path = "Key.json"
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = credential_path
bigquery_client = bigquery.Client()
dataset_ref = bigquery_client.dataset('POC')
table_ref = dataset_ref.table('data_poc')
job_config = bigquery.LoadJobConfig()
job_config.source_format = bigquery.SourceFormat.CSV
job_config.ignore_unknown_values = False
job_config.autodetect = False
job_config.field_delimiter = '|'
job_config.allow_quoted_newlines = True
job_config.allow_jagged_rows = True
job_config.write_disposition = bigquery.WriteDisposition.WRITE_TRUNCATE
job_config.max_bad_records = 0
job_config.skip_leading_rows = 0
with open('current_song.csv', 'rb') as source_file:
job = bigquery_client.load_table_from_file(
source_file,
table_ref,
location='US', # Must match the destination dataset location.
job_config=job_config) # API request
try:
print job.job_id
job.result() # Waits for table load to complete.
return 'Success'
except Exception as ex:
logger.error(
"The following exception occurred in the table load of job - {} {} ".format(ex, format(job.job_id)))
return 'Failure'
需要将CSV的全部内容加载到bigquery中。找出此错误背后原因的任何帮助将非常有帮助
答案 0 :(得分:0)
您在BigQuery和Python示例之间的输入中使用了不同的定界符。
注意:您的示例正在使用管道定界符,并且在字段中包含分号。
2c|Blackhawk gone wild|That's Just About Right|Good Times & Great Country;2019-01-16 14:22:07
您的Python代码使用分号分隔符:
job_config.field_delimiter = ';'
您的Bigquery使用的是管道定界符:
job_config.field_delimiter = '|'
根据您的BigQuery示例,更改Python代码以使其匹配:
job_config.field_delimiter = '|'
注意:我不知道您为什么要包括第一行代码:
f.write((str(callsign).split('_')[0]).lower().encode('utf-8') + ';' + artist.encode('utf-8') + ';' + song_title.encode('utf-8') + ';' + show_title.encode('utf-8') + ';' + str(time_bq).encode('utf-8') + '\n')