我正在尝试使用简单包Segue(https://code.google.com/p/segue)在EMR上运行我的R代码。
我能够连接,但是当我尝试执行函数时,我得到与S3相关的404错误(下面)。任何人都可以推测这意味着什么或我如何解决它?我不了解S3,也没有很少的EMR经验。
谢谢!
#Setup R Environment
setwd("/home/jmiller/")
install.packages("rJava")
install.packages("caTools")
install.packages("segue_0.05.tar.gz", repos = NULL, type="source")
install.packages("Matching")
library(rJava)
library(caTools)
library(segue)
library(Matching)
#Import raw data
data <- read.delim("STUFF GOES HERE ")
#Write the Function
jdm <- function (data) {STUFF GOES HERE }
#Setup EMR
setCredentials("STUFF GOES HERE ", "STUFF GOES HERE ")
> emr.test <- createCluster(numInstances=2 )
STARTING - 2013-10-30 13:50:33
STARTING - 2013-10-30 13:51:05
STARTING - 2013-10-30 13:51:36
STARTING - 2013-10-30 13:52:07
STARTING - 2013-10-30 13:52:38
BOOTSTRAPPING - 2013-10-30 13:53:09
BOOTSTRAPPING - 2013-10-30 13:53:40
BOOTSTRAPPING - 2013-10-30 13:54:12
WAITING - 2013-10-30 13:54:43
Your Amazon EMR Hadoop Cluster is ready for action.
Remember to terminate your cluster with stopCluster().
Amazon is billing you!
> emr.result <- emrlapply(emr.test, data, jdm, taskTimeout=10)
Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl, :
Status Code: 404, AWS Service: Amazon S3, AWS Request ID: F39B3FDE8682AF39, AWS Error Code: NoSuchBucket, AWS Error Message: The specified bucket does not exist, S3 Extended Request ID: g9XDhofkpgux2/mBR4t8FhY3u9G85ZxsvXZkr1SZ2a0bA871LJKNSqtgeAfaFEG0
> stopCluster(emr.test)
因此,在关闭该群集之后,我将另一个群集旋转并尝试运行Jeff Breen的示例。令我惊讶的是,它实际上是我的第一次尝试。
outputEmr&lt; - emrlapply(myCluster,myList,mean,na.rm = T) RUNNING - 2013-10-30 16:22:03 RUNNING - 2013-10-30 16:22:34 RUNNING - 2013-10-30 16:23:06 等待 - 2013-10-30 16:23:37
但是然后我试图在同一个群集上使用我自己的功能,但它失败并显示与之前和之前相同的错误消息我删除了结果示例并再次尝试。这再次产生了相同的404错误消息。
outputEmr&lt; - emrlapply(myCluster,myList,mean,na.rm = T) .jcall中的错误(“RJavaTools”,“Ljava / lang / Object;”,“invokeMethod”,cl,: 状态码:404,AWS服务:Amazon S3,AWS请求ID:8379F458DD96EC9B,AWS错误代码:NoSuchBucket,AWS错误消息:指定的存储桶不存在,S3扩展请求ID:1hjGApzfy5rd5JaM + mhhg35C / DUJ0qSa5V2uGXLjCV3tjTLfSUrM7zqsUCFKHCFH
所以我关闭了群集并旋转了另一个群集,再次只运行示例代码。这再次给了我404错误。我再尝试了2次并得到了同样的错误。
我对Segue Google小组的理解是,作者JD Long知道另外几个用户和我自己有这个问题而他正在调查它,但截至目前我们还不知道什么是破坏或如何解决它..
答案 0 :(得分:0)
HTTP 404表示AWS连接字符串不正确。很难说出你提供的连接网址。
我在Jeffrey Breen的例子中看到了这些界限,但在你的代码中没有:
> library(segue)
Loading required package: rJava
Loading required package: caTools
Loading required package: bitops
Segue did not find your AWS credentials. Please run the setCredentials() function.
> setCredentials('YOUR_ACCESS_KEY_ID', 'YOUR_SECRET_ACCESS_KEY')
> myCluster <- createCluster(numInstances=5)
http://pcsupport.about.com/od/findbyerrormessage/a/404error.htm
http://cran.r-project.org/web/views/HighPerformanceComputing.html
http://jeffreybreen.wordpress.com/2011/01/10/segue-r-to-amazon-elastic-mapreduce-hadoop/