我正在运行一个简单的循环,读取一组17个json文件(每个文件少于25行)并将其上传到bigquery表中。这是代码
dataset_ref = bigquery_client.dataset(dataset_id)
table_ref = dataset_ref.table(table_id)
job_config = bigquery.LoadJobConfig()
job_config.source_format = 'NEWLINE_DELIMITED_JSON'
job_config.autodetect = True
seq_months =('201703','201704','201705','201706','201707','201708','201709','201710','201711','201712',
'201801','201802','201803','201804','201805','201806','201807')
y=0
for y in seq_months:
json_file= 'C:\\reviews_com.llollo.bipi_%s.json' % (y)
print(json_file)
with open(json_file,'rb') as readable:
job = bigquery_client.load_table_from_file(readable, table_ref, location='US', job_config=job_config)
print(json_file)
通过print
函数,我看到循环正在正确运行。
但是,我只在表中上传了几个月。有人知道发生了什么吗?我丢失了数据。
答案 0 :(得分:1)
在Guillermo评论后,我发现我的问题出在某些ex列的格式上。模式为INTEGER时为FLOAT。这是整个代码:
seq_months=('201703','201704','201705','201706','201707','201708','201709','201710','201711','201712',
'201801','201802','201803','201804','201805','201806','201807')
y=0
for y in seq_months:
json_file= 'C:\\Users\\lloll\\Desktop\\google_play\\retained_installers\\retained_installers_com.llollo.bipi_%s_%s.json' % (y,type_data)
print(json_file)
with open(json_file,'rb') as readable:
job = bigquery_client.load_table_from_file(readable, table_ref, location='US', job_config=job_config)
job.result()
print('Loaded {} rows into {}:{}.'.format(job.output_rows, dataset_id, table_id))