我打算在同一个项目中将一组表从一个数据集复制到另一个数据集。我在Ipython notebook中执行代码。
我使用以下代码获取要在变量“value”中复制的表名列表:
list = bq.DataSet('test:TestDataset')
for x in list.tables():
if(re.match('table1(.*)',x.name.table_id)):
value = 'test:TestDataset.'+ x.name.table_id
然后我尝试使用“bq cp”命令将表从一个数据集复制到另一个数据集。但我无法在笔记本中执行bq命令。
!bq cp $value proj1:test1.table1_20162020
注意:
我尝试使用bigquery命令来检查是否有与之关联的复制命令,但找不到。
任何帮助都将不胜感激!!
答案 0 :(得分:3)
如果您正在使用BigQuery API和Python,则可以运行复制作业:
https://cloud.google.com/bigquery/docs/tables#copyingtable
从文档中复制Python示例:
$scope.updateFilteredList = function () {
$scope.filter_id = $('#filter_id').val();
console.log('filter_id: ' + $scope.filter_id);
$scope.filteredList = $filter("filter")($scope.cars, $scope.filter_id);
console.log('filteredList.length:' + $scope.filteredList.length);
};
def copyTable(service):
try:
sourceProjectId = raw_input("What is your source project? ")
sourceDatasetId = raw_input("What is your source dataset? ")
sourceTableId = raw_input("What is your source table? ")
targetProjectId = raw_input("What is your target project? ")
targetDatasetId = raw_input("What is your target dataset? ")
targetTableId = raw_input("What is your target table? ")
jobCollection = service.jobs()
jobData = {
"projectId": sourceProjectId,
"configuration": {
"copy": {
"sourceTable": {
"projectId": sourceProjectId,
"datasetId": sourceDatasetId,
"tableId": sourceTableId,
},
"destinationTable": {
"projectId": targetProjectId,
"datasetId": targetDatasetId,
"tableId": targetTableId,
},
"createDisposition": "CREATE_IF_NEEDED",
"writeDisposition": "WRITE_TRUNCATE"
}
}
}
insertResponse = jobCollection.insert(projectId=targetProjectId, body=jobData).execute()
# Ping for status until it is done, with a short pause between calls.
import time
while True:
status = jobCollection.get(projectId=targetProjectId,
jobId=insertResponse['jobReference']['jobId']).execute()
if 'DONE' == status['status']['state']:
break
print 'Waiting for the import to complete...'
time.sleep(10)
if 'errors' in status['status']:
print 'Error loading table: ', pprint.pprint(status)
return
print 'Loaded the table:' , pprint.pprint(status)#!!!!!!!!!!
# Now query and print out the generated results table.
queryTableData(service, targetProjectId, targetDatasetId, targetTableId)
except HttpError as err:
print 'Error in loadTable: ', pprint.pprint(err.resp)
命令在内部基本相同(您也可以调用该函数,具体取决于您导入的bq cp
。)
答案 1 :(得分:0)
我不确定为什么它不适合你,因为它对我来说非常适合。
projectFrom = 'project1'
datasetFrom = 'dataset1'
tableSearchString = 'test1'
projectTo = 'project2'
datasetTo = 'dataset2'
tables = bq.DataSet(projectFrom + ':' + datasetFrom).tables()
for table in tables:
if tableSearchString in table.name.table_id:
tableFrom = projectFrom + ':' + datasetFrom + '.' + table.name.table_id
tableTo = projectTo + ':' + datasetTo + '.' + table.name.table_id
!bq cp $tableFrom $tableTo
在笔记本中试试这个,因为它适用于我。
只是想知道,从脚本返回的错误代码是什么?
答案 2 :(得分:0)
我认为这会对您有所帮助。
tables = source_dataset.list_tables()
for table in tables:
#print table.name
job_id = str(uuid.uuid4())
dest_table = dest_dataset.table(table.name)
source_table = source_dataset.table(table.name)
if not dest_table.exists():
job = self.bigquery_client.copy_table(job_id, dest_table, source_table)
job.create_disposition = (google.cloud.bigquery.job.CreateDisposition.CREATE_IF_NEEDED)
job.begin()
job.result()
答案 3 :(得分:0)
我创建了以下脚本,通过几次验证将所有表从一个数据集复制到另一个数据集。
from google.cloud import bigquery
client = bigquery.Client()
projectFrom = 'source_project_id'
datasetFrom = 'source_dataset'
projectTo = 'destination_project_id'
datasetTo = 'destination_dataset'
# Creating dataset reference from google bigquery cient
dataset_from = client.dataset(dataset_id=datasetFrom, project=projectFrom)
dataset_to = client.dataset(dataset_id=datasetTo, project=projectTo)
for source_table_ref in client.list_dataset_tables(dataset=dataset_from):
# Destination table reference
destination_table_ref = dataset_to.table(source_table_ref.table_id)
job = client.copy_table(
source_table_ref,
destination_table_ref)
job.result()
assert job.state == 'DONE'
dest_table = client.get_table(destination_table_ref)
source_table = client.get_table(source_table_ref)
assert dest_table.num_rows > 0 # validation 1
assert dest_table.num_rows == source_table.num_rows # validation 2
print ("Source - table: {} row count {}".format(source_table.table_id,source_table.num_rows ))
print ("Destination - table: {} row count {}".format(dest_table.table_id, dest_table.num_rows))
答案 4 :(得分:0)
假设您要复制大多数表,则可以先copy the entire BigQuery dataset,然后删除一些您不想复制的表。
副本数据集UI与副本表相似。只需单击源数据集中的“复制数据集”按钮,然后在弹出表单中指定目标数据集。您可以将数据集复制到另一个项目或另一个区域。在下面查看有关如何复制数据集的屏幕截图。
复制数据集按钮
复制数据集表格
答案 5 :(得分:0)
现在可以在 BigQuery Data Transfer Service 中使用应对数据集功能。在BigQuery Web控制台中选择传输服务,并填写源和目标详细信息,然后按需运行它,或在指定的时间间隔安排它。
或者只需运行以下gcloud命令即可实现
bq mk --transfer_config --project_id=[PROJECT_ID] --data_source=[DATA_SOURCE] --target_dataset=[DATASET] --display_name=[NAME] --params='[PARAMETERS]'