Hive和Google云存储问题

时间:2018-05-24 18:42:30

标签: hive google-cloud-storage

请建议:

我已使用VM的实例在GCP中安装了hadoop 2.6.5版本群集。使用GCP连接器并由hdfs指向使用gs bucket。在coresite.xml中添加了以下2个条目:

google.cloud.auth.service.account.json.keyfile=<Path-to-the-JSON-file> 
fs.gs.working.dir=/

当使用hadoop gs -ls /工作正常时,但是当我创建一个hive表

CREATE EXTERNAL TABLE test1256(name string,id  int)   LOCATION   'gs://bucket/';

我收到以下错误:

  

错误:处理语句时出错:FAILED:Execution Error,   从org.apache.hadoop.hive.ql.exec.DDLTask返回代码1。   MetaException(消息:java.security.AccessControlException:Permission   否认:user = hdpuser1,path =“gs:// bucket /”:hive:hive:drwx ------)   (状态= 08S01,代码= 1)

2 个答案:

答案 0 :(得分:0)

如错误消息所暗示,您遇到了权限问题。首先,检查您的Google Cloud Console,以确保您有Cloud Storage IAM permissions来执行该操作。接下来,确保您对hdfs中的“ hdpuser1”用户具有正确的权限:

  • 以hdfs用户身份登录:#Function to calculate the Root mean squared value for both training and test dataset. rmseCalc <-function(df_train,df_test,ratio){ avg_train_rmse=c() avg_test_rmse=c() for(i in 1:10){ set.seed(125) #Sample values to extract a ratio of the training data. train_temp_rand<-sample(1:nrow(df_train),size=floor(ratio*nrow(df_train))) #Partial Training data based on the ratio temp_df<-df_train[train_temp_rand,] #Get the linear model object theta_mle=lm(Gross~.,temp_df) #Predict the values on the test dataset. predict_test = predict(theta_mle,df_test) predict_test<-as.data.frame(predict_test) #Get the rmse for training data and the test data. avg_train_rmse=c(avg_train_rmse,sqrt(residuals(theta_mle)^2)) avg_test_rmse=c(avg_test_rmse,sqrt(mean((df_test$Gross - predict_test)^2))) } return_value=c() return_value = c(return_value,mean(avg_train_rmse)) return_value =c(return_value,mean(avg_test_rmse)) return (return_value) } train_model_helper<-function(dataset,datarand){ sampling_list = c(0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1) train_rmse = c() test_rmse=c() train_data_sample<-dataset[datarand,] test_data_sample<-dataset[-datarand,] for (val in sampling_list) { retval_rmse=rmseCalc(train_data_sample,test_data_sample,val) print (retval_rmse) train_rmse = c(train_rmse,retval_rmse[1]) test_rmse=c(test_rmse,retval_rmse[2]) } df_rmse_m1<-data.frame(sampling_list,train_rmse,test_rmse) return( df_rmse_m1) } df_m1=train_model_helper(df_numeric,train_rand)
  • 您还可以更改权限:su hdfs

然后执行以下操作:

hdfs dfs -chown -R <username_of_new_owner> /user

然后尝试再次创建Hive表。希望对您有所帮助。

找到类似的答案here。以及GitHub中performing a test with Hive

的完整指南

答案 1 :(得分:0)

我今天遇到了这个错误,并且可以通过添加以下两个属性来解决该问题:

fs.gs.reported.permissions=777
fs.gs.path.encoding=uri-path

两者到core-site.xml和hive-site.xml(通过在ambari中转到HDFS和Hive服务的高级配置)。

如果仅在core-site.xml中进行配置,则配置单元外部表创建将失败。