从集群读取csv时,“ java.io.IOException:无法获取主Kerberos主体作为更新”

时间:2018-09-30 02:49:45

标签: apache-spark hadoop2

我正在使用带有Spark 2.1.1的Scala 2.11.4版本和具有2.6.0-cdh5.7.1的hadoop版本。以下是我用于读取csv文件但进入“ java.io.IOException:无法获取主Kerberos主体用作更新程序”的代码段。我正在从本地连接到cloudera集群的intelliji中运行该程序。我正在使用kerberos身份验证连接到集群。直到“ diamonds.printSchema”行,该程序执行完美,但尝试读取CSV文件时失败,但可以读取sqlContext.read.textFile(“ / tmp / xyz / data.csv”)的csv文件。我尝试了谷歌它,但到目前为止没有运气。问题日志https://issues.apache.org/jira/browse/SPARK-20328似乎相关,但无法解决问题。有没有人遇到这个问题?感谢您的帮助

val spark: SparkSession = {
  SparkSession
    .builder()
    .master(master)
    .appName(appName)
    .enableHiveSupport()
    .getOrCreate()
}
val sqlContext = spark.sqlContext
var parquet = sqlContext.read.parquet("../data.parquet/part.snappy.parquet")
parquet.printSchema()
//parquet.write.mode(SaveMode.Append).save("/tmp/xyz/data.parquet")
val diamonds = sqlContext.read.textFile("/tmp/xyz/data.csv")
diamonds.printSchema
val readcsv = sqlContext.read.format("com.databricks.spark.csv").option("header", "true") .load("/tmp/xyz/data.csv")

尝试从cloudera群集访问csv文件时遇到以下问题。

线程“主”中的异常java.io.IOException:无法获取用作更新程序的主Kerberos主体     在org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:133)     在org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:100)     在org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:80)     在org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:206)     在org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:315)     在org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:202)     在org.apache.spark.rdd.RDD $$ anonfun $ partitions $ 2.apply(RDD.scala:252)     在org.apache.spark.rdd.RDD $$ anonfun $ partitions $ 2.apply(RDD.scala:250)     在scala.Option.getOrElse(Option.scala:120)     在org.apache.spark.rdd.RDD.partitions(RDD.scala:250)     在org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)     在org.apache.spark.rdd.RDD $$ anonfun $ partitions $ 2.apply(RDD.scala:252)     在org.apache.spark.rdd.RDD $$ anonfun $ partitions $ 2.apply(RDD.scala:250)     在scala.Option.getOrElse(Option.scala:120)     在org.apache.spark.rdd.RDD.partitions(RDD.scala:250)     在org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)     在org.apache.spark.rdd.RDD $$ anonfun $ partitions $ 2.apply(RDD.scala:252)     在org.apache.spark.rdd.RDD $$ anonfun $ partitions $ 2.apply(RDD.scala:250)     在scala.Option.getOrElse(Option.scala:120)     在org.apache.spark.rdd.RDD.partitions(RDD.scala:250)     在org.apache.spark.rdd.RDD $$ anonfun $ take $ 1.apply(RDD.scala:1333)     在org.apache.spark.rdd.RDDOperationScope $ .withScope(RDDOperationScope.scala:151)     在org.apache.spark.rdd.RDDOperationScope $ .withScope(RDDOperationScope.scala:112)     在org.apache.spark.rdd.RDD.withScope(RDD.scala:362)     在org.apache.spark.rdd.RDD.take(RDD.scala:1327)     在org.apache.spark.rdd.RDD $$ anonfun $ first $ 1.apply(RDD.scala:1368)     在org.apache.spark.rdd.RDDOperationScope $ .withScope(RDDOperationScope.scala:151)     在org.apache.spark.rdd.RDDOperationScope $ .withScope(RDDOperationScope.scala:112)     在org.apache.spark.rdd.RDD.withScope(RDD.scala:362)     在org.apache.spark.rdd.RDD.first(RDD.scala:1367)     在org.apache.spark.sql.execution.datasources.csv.CSVFileFormat.findFirstLine(CSVFileFormat.scala:206)     在org.apache.spark.sql.execution.datasources.csv.CSVFileFormat.inferSchema(CSVFileFormat.scala:60)     位于org.apache.spark.sql.execution.datasources.DataSource $$ anonfun $ 7.apply(DataSource.scala:184)     位于org.apache.spark.sql.execution.datasources.DataSource $$ anonfun $ 7.apply(DataSource.scala:184)     在scala.Option.orElse(Option.scala:288)     位于org.apache.spark.sql.execution.datasources.DataSource.org $ apache $ spark $ sql $ execution $ datasources $ DataSource $$ getOrInferFileFormatSchema(DataSource.scala:183)     在org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:387)     在org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:152)     在org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:135)     在LoadCSV $ .main(LoadCSV.scala:44)     在LoadCSV.main(LoadCSV.scala)

0 个答案:

没有答案