Question

我正在尝试使用Dataduct运行数据管道。以下是我在〜/ .dataduct /

中的dataduct.cfg中定义的配置

ec2:
    INSTANCE_TYPE: t2.micro
    ETL_AMI: ****
    SECURITY_GROUP: ****

mysql:
    host_alias_1:
        HOST: jdbc:mysql://****.amazonaws.com:3306
        PASSWORD: ****
        USERNAME: ****

etl:
    CONNECTION_RETRIES: 2
    NAME_PREFIX: dataduct-
    RESOURCE_ROLE: EC2_ROLE_FOR_DATA_PIPELINE
    RETRY_DELAY: 10 Minutes
    REGION: us-east-1
    ROLE: DataPipelineDefaultRole
    S3_BASE_PATH: aws-dataduct-base-path
    S3_ETL_BUCKET: aws-dataduct-etl-bucket

emr:
    CLUSTER_AMI: 3.1.0
    CLUSTER_TIMEOUT: 1 Hours
    CORE_INSTANCE_TYPE: m1.large
    NUM_CORE_INSTANCES: 1
    HADOOP_VERSION: 2.4.0
    HIVE_VERSION: null
    MASTER_INSTANCE_TYPE: m3.xlarge
    PIG_VERSION: null
    TASK_INSTANCE_BID_PRICE: null
    TASK_INSTANCE_TYPE: m1.large

redshift:
    CLUSTER_ID:
    DATABASE_NAME:
    HOST:
    PASSWORD:
    USERNAME:
    PORT:

logging:
    CONSOLE_DEBUG_LEVEL: INFO
    FILE_DEBUG_LEVEL: DEBUG
    LOG_DIR: ~/.dataduct
    LOG_FILE: dataduct.log

下面是我的数据管道定义

# HEADER INFORMATION
name : dataduct_copy_s3_to_rds
frequency : one-time

# DESCRIPTION
description : Example - copying s3 data to rds

# PIPELINE STEPS
steps:
-   step_type: extract-s3
    name: s3extract
    file_uri: s3://aws-dataduct-etl-bucket/employees.csv

-   step_type: create-update-sql
    table_definition: /home/ec2-user/dp/testdb.employee.sql
    command: INSERT INTO testdb.employee (id,firstname,lastname,email,phone,salary,designation,manager,manageremail,department,date) VALUES(?,?,?,?,?,?,?,?,?,?,?)

尝试使用命令 dataduct管道激活copys3tords.yaml

时出现错误提示

跟踪（最近一次呼叫最近）：文件“”，ImportError中的第1行，未命名模块 dataduct.steps.executors.runner

是否正在寻找我需要安装的任何依赖项？如何解决。请让我知道。预先感谢。

Dataduct-无法配置和运行成功的AWS Data Pipeline

0 个答案: