Question

当我使用embulk工具时，从redshift导入Google bigquery，当添加is_skip_job_result_check属性时：true就是数据丢失！（每个数据允许导入bigquery最多有1000个错误记录），它是我的配置。 Yml文件。

在

type: redshift
  host: ...
  port: 5439
  user: my_user
  password: password
  database: my_database
  schema: public
  fetch_rows: 1000

查询：

SELECT  * FROM app140681.events140681_5747135  
WHERE TO_CHAR(event_time, 'YYYYMMDD') = '20160707'

出：

type: bigquery
  auth_method: json_key  
  json_keyfile:
    content: |
      {
          "private_key_id": "...",
           "private_key": "-----BEGIN PRIVATE KEY------END PRIVATE KEY-----\n",
           "client_email": "..."
      }
  project: my_project
  dataset: testdataset
  auto_create_table: true
  table: test_redshift
  emplate_table: test_redshift_schema.json
  #schema_file: ./schema.json
  max_bad_records: 1000
  abort_on_error: false
  compression: NONE
  is_skip_job_result_check: true
  job_status_polling_interval: 5
  source_format: CSV
  "CSV"
  default_timezone: 'UTC'

Answer 1

如果is_skip_job_result_check为真，则embulk-output-bigquery会跳过等待BigQuery加载作业完成，因此embulk-output-bigquery不能再做任何事了。如果is_skip_job_result_check为false，则embulk-output-bigquery可以获取加载作业的结果状态，并在必要时自动重试。

使用is_skip_job_result_check: true，您必须检查是否未在BigQuery控制台上手动中止embulk加载作业，并在必要时重试运行embulk。请检查BigQuery控制台。

embulk输入redshift输出bigquery数据丢失

1 个答案: