这是blob信息:
Blob {bucket = some_bucket,name = somefile-000000000001.json.gz,generation = 1539720839099466,size = 42455994,content-type = application / octet-stream,metadata = null}
somefile -... json.gz是BigQuery的转储文件(添加所有文件时总计约为4gig)
您可以看到大小约为42兆。但是当我执行blob.downloadTo(... file)时,它会运行并运行,并且可以轻松达到> 300 GB的大小,并且似乎可以永久运行...这对我来说很奇怪,因为它几乎是相同的代码比谷歌的例子。
有趣的事实一无是处
有人知道吗?
要转储到存储桶中的代码示例
String bucketUrl = "gs://" + BUCKET_NAME + "/"+table.getDataset()+"/"+filename+"-*." + EXPORT_EXTENSION;
log.info("Exporting table " + table.getTable() + " to " + bucketUrl);
ExtractJobConfiguration extractConfiguration = ExtractJobConfiguration.newBuilder(table, bucketUrl)
.setCompression(EXPORT_COMPRESSION)
.setFormat(EXPORT_FORMAT)
.build();
Job job = bigquery.create(JobInfo.of(extractConfiguration));
try {
// Wait for the job to complete
Job completedJob = job.waitFor(RetryOption.initialRetryDelay(Duration.ofSeconds(1)),
RetryOption.totalTimeout(Duration.ofMinutes(3)));
if (completedJob != null && completedJob.getStatus().getError() == null) {
return true;
} else {
log.error(completedJob.getStatus().getError());
throw new BigQueryException(1, "Unable to complete the export", completedJob.getStatus().getError());
}
}
catch (InterruptedException e) {
Thread.currentThread().interrupt();
}
return false;
和要下载的代码(其中blob = Blob {bucket = some_bucket,name = somefile-000000000001.json.gz,generation = 1539720839099466,size = 42455994,content-type = application / octet-stream,metadata = null} )
Blob blob = storage.get(BlobId.of(bucketName, srcFilename));
blob.downloadTo(destFilePath);
答案 0 :(得分:2)
我使用了以下代码,并且成功完成了导出并可以下载压缩文件:
import com.google.api.gax.paging.Page;
import com.google.cloud.storage.Bucket;
import com.google.cloud.storage.BucketInfo;
import com.google.cloud.storage.Storage;
import com.google.cloud.storage.StorageOptions;
import com.google.cloud.storage.Blob;
import com.google.cloud.storage.BlobId;
import com.google.cloud.bigquery.BigQuery;
import com.google.cloud.bigquery.BigQueryOptions;
import com.google.cloud.bigquery.BigQueryException;
import com.google.cloud.bigquery.ExtractJobConfiguration;
import com.google.cloud.bigquery.TableId;
import com.google.cloud.bigquery.Job;
import com.google.cloud.bigquery.JobInfo;
import java.util.Date;
import java.nio.file.Path;
import java.nio.file.Paths;
public class QuickstartSample {
public static void main(String... args) throws Exception {
// Instantiates clients
Storage storage = StorageOptions.getDefaultInstance().getService();
BigQuery bigquery = BigQueryOptions.getDefaultInstance().getService();
TableId table = TableId.of("dataset","table");
// The name for the new bucket
String bucketName = "bucket";
ExtractJobConfiguration extractConfiguration = ExtractJobConfiguration.newBuilder(table, "gs://"+bucketName+"/somefile-*.json.gz")
.setCompression("GZIP")
.setFormat("NEWLINE_DELIMITED_JSON")
.build();
Job startedJob = bigquery.create(JobInfo.of(extractConfiguration));
// Wait for the job to complete
while(!startedJob.isDone()){
System.out.println("Waiting for job " + startedJob.getJobId().getJob() + " to complete");
Thread.sleep(1000L);
}
if (startedJob.getStatus().getError() == null) {
System.out.println("Job " + startedJob.getJobId().getJob() + " succeeded");
} else {
System.out.println("Job " + startedJob.getJobId().getJob() + " failed");
System.out.println("Error: " + startedJob.getStatus().getError());
}
Bucket bucket = storage.get(bucketName);
Page<Blob> blobs = bucket.list();
System.out.println("Downloading");
for (Blob blob : blobs.iterateAll()) {
System.out.println("Name: " + blob.getName());
System.out.println("Size: " + blob.getSize());
Path destFilePath = Paths.get(blob.getName());
blob.downloadTo(destFilePath);
}
}
}
我使用的pom.xml文件依赖项如下:
<dependency>
<groupId>com.google.cloud</groupId>
<artifactId>google-cloud-storage</artifactId>
<version>1.38.0</version>
</dependency>
<dependency>
<groupId>com.google.cloud</groupId>
<artifactId>google-cloud-bigquery</artifactId>
<version>1.48.0</version>
</dependency>
希望有帮助。