我正在尝试使用Dataduct运行数据管道。以下是我在〜/ .dataduct /
中的dataduct.cfg中定义的配置ec2:
INSTANCE_TYPE: t2.micro
ETL_AMI: ****
SECURITY_GROUP: ****
mysql:
host_alias_1:
HOST: jdbc:mysql://****.amazonaws.com:3306
PASSWORD: ****
USERNAME: ****
etl:
CONNECTION_RETRIES: 2
NAME_PREFIX: dataduct-
RESOURCE_ROLE: EC2_ROLE_FOR_DATA_PIPELINE
RETRY_DELAY: 10 Minutes
REGION: us-east-1
ROLE: DataPipelineDefaultRole
S3_BASE_PATH: aws-dataduct-base-path
S3_ETL_BUCKET: aws-dataduct-etl-bucket
emr:
CLUSTER_AMI: 3.1.0
CLUSTER_TIMEOUT: 1 Hours
CORE_INSTANCE_TYPE: m1.large
NUM_CORE_INSTANCES: 1
HADOOP_VERSION: 2.4.0
HIVE_VERSION: null
MASTER_INSTANCE_TYPE: m3.xlarge
PIG_VERSION: null
TASK_INSTANCE_BID_PRICE: null
TASK_INSTANCE_TYPE: m1.large
redshift:
CLUSTER_ID:
DATABASE_NAME:
HOST:
PASSWORD:
USERNAME:
PORT:
logging:
CONSOLE_DEBUG_LEVEL: INFO
FILE_DEBUG_LEVEL: DEBUG
LOG_DIR: ~/.dataduct
LOG_FILE: dataduct.log
下面是我的数据管道定义
# HEADER INFORMATION
name : dataduct_copy_s3_to_rds
frequency : one-time
# DESCRIPTION
description : Example - copying s3 data to rds
# PIPELINE STEPS
steps:
- step_type: extract-s3
name: s3extract
file_uri: s3://aws-dataduct-etl-bucket/employees.csv
- step_type: create-update-sql
table_definition: /home/ec2-user/dp/testdb.employee.sql
command: INSERT INTO testdb.employee (id,firstname,lastname,email,phone,salary,designation,manager,manageremail,department,date) VALUES(?,?,?,?,?,?,?,?,?,?,?)
尝试使用命令 dataduct管道激活copys3tords.yaml
时出现错误提示跟踪(最近一次呼叫最近):文件“”,ImportError中的第1行,未命名模块 dataduct.steps.executors.runner
是否正在寻找我需要安装的任何依赖项?如何解决。请让我知道。预先感谢。