我在蜂巢中有两张桌子:一张是测试,另一张是火车。我写了一些R代码来从hive中获取表格。
这是R代码:
#loading library
library(RHive)
library(rhdfs)
library(rmr2)
library(gplots)
library(gtable)
library(gtools)
library(caTools)
library(RMySQL)
library(devtools)
rhive.connect(host="xxx.xxx.x.xxx",port=xxxxx, hiveServer2=FALSE, defaultFS=NULL,
updateJar=FALSE, user=NULL, password=NULL)
train<-rhive.query("select * from train")
test<-rhive.query("select * from test")
trainglm <- glm(leadconverted~.,data=train)
p1<-predict.glm(trainglm,newdata=test,type="response")
lcp<-as.matrix(p1[])
colnames(lcp)<-"LeadConverted"
test1<-cbind(test,lcp)
test2<-test1
colnames(test2)<-NULL
write.csv(test2,file="/home/dsri/Downloads/test1",quote=F,row.names=F)
rhive.query("drop table test")
rhive.query("CREATE TABLE test(Std_DistanctToVendor float,Std_Income float,Std_ZipPopulationDensity float,FirstLastPropCase INT,NameEmailCheck INT,SingleWeekday STRING,lead_TimeFrameCont STRING,
,Vehicle_FinanceMethod STRING,AddressProvided INT,Hybrid INT,LeadConverted float)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE")
rhive.query("LOAD DATA local INPATH '/home/dsri/Downloads/test1' OVERWRITE INTO TABLE test")
我将此R代码保存为/home/dsri/Downloads/myscript.r
我必须从hive运行此代码。
我没有得到如何开始以及如何继续。