Question

我正在运行一个简单的循环，读取一组17个json文件（每个文件少于25行）并将其上传到bigquery表中。这是代码

dataset_ref = bigquery_client.dataset(dataset_id)
table_ref = dataset_ref.table(table_id)
job_config = bigquery.LoadJobConfig()
job_config.source_format = 'NEWLINE_DELIMITED_JSON'
job_config.autodetect = True

seq_months =('201703','201704','201705','201706','201707','201708','201709','201710','201711','201712',
            '201801','201802','201803','201804','201805','201806','201807')
y=0
for y in seq_months:
    json_file= 'C:\\reviews_com.llollo.bipi_%s.json' % (y)
    print(json_file)
    with open(json_file,'rb') as readable:
        job = bigquery_client.load_table_from_file(readable, table_ref, location='US', job_config=job_config)
        print(json_file)

通过print函数，我看到循环正在正确运行。但是，我只在表中上传了几个月。有人知道发生了什么吗？我丢失了数据。

Answer 1

在Guillermo评论后，我发现我的问题出在某些ex列的格式上。模式为INTEGER时为FLOAT。这是整个代码：

seq_months=('201703','201704','201705','201706','201707','201708','201709','201710','201711','201712',
                '201801','201802','201803','201804','201805','201806','201807')
        y=0
        for y in seq_months:
            json_file= 'C:\\Users\\lloll\\Desktop\\google_play\\retained_installers\\retained_installers_com.llollo.bipi_%s_%s.json' % (y,type_data)
            print(json_file)
            with open(json_file,'rb') as readable:
                job = bigquery_client.load_table_from_file(readable, table_ref, location='US', job_config=job_config)
            job.result()
            print('Loaded {} rows into {}:{}.'.format(job.output_rows, dataset_id, table_id))

使用python在bigquery中使用循环上传文件。缺失数据

1 个答案: