我试图创建一个" Step"并将许多小文件合并为一个,所以我可以将它分开几天。问题在于,我是在尝试运行而不是让我。
执行它对我来说效果很好:
hadoop distcp s3n://buket-name/output-files-hive/* s3n://buket-name/files-hive/test
但是如果我已经输入命令" group by"或者" srcPattern"它没有给我任何东西。
创建"步骤"在Amazon EMR控制台中,给我所有时间错误。你指出了文件
命令:
aws emr add-steps --cluster-id j-XXXXXXX --steps Name="S3DistCp step",Jar="command-runner.jar",Args=["spark-submit","--src=s3n://buket-name/output-files-hive/output-files-hive/*","--dest=s3n://buket-name/output-files-hive/files-hive/test/"]
错误:
2016-07-13T15:06:27.677Z INFO Ensure step 3 jar file command-runner.jar
2016-07-13T15:06:27.678Z INFO StepRunner: Created Runner for step 3
INFO startExec 'hadoop jar /var/lib/aws/emr/step-runner/hadoop-jars/command-runner.jar spark-submit --src=s3n://buket-name/output-files-hive/* --dest=s3n://buket-name/files-hive/test/'
INFO Environment:
TERM=linux
CONSOLETYPE=serial
SHLVL=5
JAVA_HOME=/etc/alternatives/jre
HADOOP_IDENT_STRING=hadoop
LANGSH_SOURCED=1
XFILESEARCHPATH=/usr/dt/app-defaults/%L/Dt
HADOOP_ROOT_LOGGER=INFO,DRFA
AWS_CLOUDWATCH_HOME=/opt/aws/apitools/mon
UPSTART_JOB=rc
MAIL=/var/spool/mail/hadoop
EC2_AMITOOL_HOME=/opt/aws/amitools/ec2
PWD=/
HOSTNAME=ip-172-31-21-173
LESS_TERMCAP_se=[0m
LOGNAME=hadoop
UPSTART_INSTANCE=
AWS_PATH=/opt/aws
LESS_TERMCAP_mb=[01;31m
_=/etc/alternatives/jre/bin/java
LESS_TERMCAP_me=[0m
NLSPATH=/usr/dt/lib/nls/msg/%L/%N.cat
LESS_TERMCAP_md=[01;38;5;208m
runlevel=3
AWS_AUTO_SCALING_HOME=/opt/aws/apitools/as
UPSTART_EVENTS=runlevel
HISTSIZE=1000
previous=N
HADOOP_LOGFILE=syslog
PATH=/sbin:/usr/sbin:/bin:/usr/bin:/usr/local/sbin:/opt/aws/bin
EC2_HOME=/opt/aws/apitools/ec2
HADOOP_LOG_DIR=/mnt/var/log/hadoop/steps/s-2SKUUYYPQ4KKK
LESS_TERMCAP_ue=[0m
AWS_ELB_HOME=/opt/aws/apitools/elb
RUNLEVEL=3
USER=hadoop
HADOOP_CLIENT_OPTS=-Djava.io.tmpdir=/mnt/var/lib/hadoop/steps/s-2SKUUYYPQ4KKK/tmp
PREVLEVEL=N
HOME=/home/hadoop
HISTCONTROL=ignoredups
LESSOPEN=||/usr/bin/lesspipe.sh %s
AWS_DEFAULT_REGION=eu-west-1
LANG=en_US.UTF-8
LESS_TERMCAP_us=[04;38;5;111m
INFO redirectOutput to /mnt/var/log/hadoop/steps/s-2SKUUYYPQ4KKK/stdout
INFO redirectError to /mnt/var/log/hadoop/steps/s-2SKUUYYPQ4KKK/stderr
INFO Working dir /mnt/var/lib/hadoop/steps/s-2SKUUYYPQ4KKK
INFO ProcessRunner started child process 7836 :
hadoop 7836 2229 0 15:06 ? 00:00:00 bash /usr/lib/hadoop/bin/hadoop jar /var/lib/aws/emr/step-runner/hadoop-jars/command-runner.jar spark-submit --src=s3n://buket-name/output-files-hive/* --dest=s3n://buket-name/files-hive/test/
2016-07-13T15:06:31.724Z INFO HadoopJarStepRunner.Runner: startRun() called for s-2SKUUYYPQ4KKK Child Pid: 7836
INFO Synchronously wait child process to complete : hadoop jar /var/lib/aws/emr/step-runner/hadoop-...
INFO waitProcessCompletion ended with exit code 1 : hadoop jar /var/lib/aws/emr/step-runner/hadoop-...
INFO total process run time: 2 seconds
2016-07-13T15:06:31.991Z INFO Step created jobs:
2016-07-13T15:06:31.992Z WARN Step failed with exitCode 1 and took 2 seconds
答案 0 :(得分:0)
在emr amazon的新版本中,不再是necesartio包含.jar文件,定义S3DistCp参数。
aws emr add-steps --cluster-id j-XXXXXX --steps Name="S3DistCp step V3",Jar="command-runner.jar",Args=["s3-dist-cp","--src=s3n://buket-name/output-files-hive/","--dest=s3n://buket-name/files-hive/test/"]