在C#中将CSV上传到BigQuery

时间:2015-04-13 13:26:14

标签: c#-4.0 google-bigquery

基本上我想做的是向BigQuery(异步)提交作业,检查作业状态并打印出相应的状态信息或错误信息。我创建了一个框架如下。但我需要帮助:

  1. GoogleApiException:找不到作业异常" BigQueryService.Jobs.Get(jobReference.ProjectId,jobReference.JobId).Execute()"被称为。我的直觉是工作没有正确提交,但我不知道如何正确地完成工作。

  2. 我应该如何处理GoogleApiExceptions?

  3. 第四步:创建一个Job(将CSV文件上传到BigQuery中),返回JobReference

            TableReference DestTable = new TableReference();
            DestTable.ProjectId = project;
            DestTable.DatasetId = dataset;
            DestTable.TableId = tableId;
    
            Job Job = new Job();
            JobConfiguration Config = new JobConfiguration();
            JobConfigurationLoad ConfigLoad = new JobConfigurationLoad();
    
    
            ConfigLoad.Schema = schema;
            ConfigLoad.DestinationTable = DestTable;
            ConfigLoad.Encoding = "ISO-8859-1";
            ConfigLoad.CreateDisposition = "CREATE_IF_NEEDED";
            ConfigLoad.WriteDisposition = createDisposition;
            ConfigLoad.FieldDelimiter = delimiter.ToString();
            ConfigLoad.AllowJaggedRows = true;
            Config.Load = ConfigLoad;
            Job.Configuration = Config;
    
            //set job reference (mainly job id)
            JobReference JobRef = new JobReference();
            JobRef.JobId = GenerateJobID("Upload");
            JobRef.ProjectId = project;
            Job.JobReference = JobRef;
    
            using(FileStream fileStream = new FileStream(filePath,FileMode.Open)){
                var JobInfo = BigQueryService.Jobs.Insert(Job,project,fileStream,"text/csv");//application/octet-stream
                JobInfo.UploadAsync();
                Console.WriteLine(JobInfo.GetProgress().Status.ToString());
            }
            return JobRef;
    

    然后,从第一步中使用projectId和jobId在返回的JobReference中拉取作业状态:

         while (true)
            {
                  pollJob = BigQueryService.Jobs.Get(jobReference.ProjectId, jobReference.JobId).Execute();
                    i = 0;
                    Console.WriteLine("Job status" + jobReference.JobId + ": " + pollJob.Status.State);
                    if (pollJob.Status.State.Equals("DONE"))
                    {
                        return pollJob;
                    }
                    // Pause execution for pauseSeconds before polling job status again,
                    // to reduce unnecessary calls to the BigQuery API and lower overall
                    // application bandwidth.
                    Thread.Sleep(pauseSeconds * 1000);
    
            }
    

1 个答案:

答案 0 :(得分:3)

几乎没有任何有用的示例代码显示如何将本地CSV文件上传到Bigquery表。我最终得到了一些工作。它可能不是最好的解决方案,但至少可行。它有任何改进。

private JobReference JobUpload(string project, string dataset, string tableId, string filePath, TableSchema schema, string createDisposition, char delimiter)
    {

        TableReference DestTable = new TableReference();
        DestTable.ProjectId = project;
        DestTable.DatasetId = dataset;
        DestTable.TableId = tableId;

        Job Job = new Job();
        JobConfiguration Config = new JobConfiguration();
        JobConfigurationLoad ConfigLoad = new JobConfigurationLoad();


        ConfigLoad.Schema = schema;
        ConfigLoad.DestinationTable = DestTable;
        ConfigLoad.Encoding = "ISO-8859-1";
        ConfigLoad.CreateDisposition = "CREATE_IF_NEEDED";
        ConfigLoad.WriteDisposition = createDisposition;
        ConfigLoad.FieldDelimiter = delimiter.ToString();
        ConfigLoad.AllowJaggedRows = true;
        ConfigLoad.SourceFormat = "CSV";
        Config.Load = ConfigLoad;
        Job.Configuration = Config;

        //set job reference (mainly job id)
        JobReference JobRef = new JobReference();
        JobRef.JobId = GenerateJobID("Upload");
        JobRef.ProjectId = project;
        Job.JobReference = JobRef;

        using(FileStream fileStream = new FileStream(filePath,FileMode.Open)){
            JobsResource.InsertMediaUpload InsertMediaUpload = new  JobsResource.InsertMediaUpload(BigQueryService,Job,Job.JobReference.ProjectId,fileStream,"application/octet-stream");
            var JobInfo = InsertMediaUpload.UploadAsync();
            Console.WriteLine(JobInfo.Status);
            while (!JobInfo.IsCompleted)
            {
               //wait for the job to be activated and run 
                Console.WriteLine(JobInfo.Status);
            }
        }
        return JobRef;
    }

在此之后,您实际上可以使用返回的JobRef来提取作业状态,几乎与我们使用Java API一样:

while(true)
{
     PollJob = BigQueryService.Jobs.Get(jobReference.ProjectId, jobReference.JobId).Execute();

     Console.WriteLine("Job status" + jobReference.JobId + ": " + PollJob.Status.State);
     if (PollJob.Status.State.Equals("DONE"))
     {
       return PollJob;
     }
}