我正在从Hbase读取数据并使用以下代码创建SQL Row的RRD。
val getFormattedRDD = hbaseRawRDD.map(data => {
//set column family
val colFamily = Bytes.toBytes("0")
var dataBuffer = ArrayBuffer.empty[Any]
dataBuffer += new String(data._1.get)
tableSchema.map { column =>
{
var dataBytes = data._2.getValue(colFamily, Bytes.toBytes(column.columnName))
if (dataBytes != null) {
if (column.dataType == "INT") {
dataBuffer += (Bytes.toInt(dataBytes) ^ -2147483648)
}
if (column.dataType == "FLOAT") {
dataBuffer += (Bytes.toFloat(dataBytes))
}
else {
dataBuffer += new String(dataBytes)
}
} else {
dataBuffer += null
}
}
}
Row.fromSeq(dataBuffer)
})
在RDD中,浮点数看起来很好,为22.22,但是当我使用以下命令从中创建DataFrame时。
val dfResults = sqlContext.createDataFrame(getFormattedRDD, dataframeSchema)
dfResults.show
它会引发错误......
scala.MatchError:22.22(类java.lang.Float) at org.apache.spark.sql.catalyst.CatalystTypeConverters $ StringConverter $ .toCatalystImpl(CatalystTypeConverters.scala:295) at org.apache.spark.sql.catalyst.CatalystTypeConverters $ StringConverter $ .toCatalystImpl(CatalystTypeConverters.scala:294) at org.apache.spark.sql.catalyst.CatalystTypeConverters $ CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:102) at org.apache.spark.sql.catalyst.CatalystTypeConverters $ StructConverter.toCatalystImpl(CatalystTypeConverters.scala:260) at org.apache.spark.sql.catalyst.CatalystTypeConverters $ StructConverter.toCatalystImpl(CatalystTypeConverters.scala:250) at org.apache.spark.sql.catalyst.CatalystTypeConverters $ CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:102) 在org.apache.spark.sql.catalyst.CatalystTypeConverters $$ anonfun $ createToCatalystConverter $ 2.apply(CatalystTypeConverters.scala:401) at org.apache.spark.sql.SQLContext $$ anonfun $ 6.apply(SQLContext.scala:492) 在org.apache.spark.sql.SQLContext $$ anonfun $ 6.apply(SQLContext.scala:492)
数据框的架构是:
StructField(KEY,StringType,true)
StructField(PRODUCT_TYPE,StringType,true)
StructField(REGION_KEY,StringType,true)
StructField(BUSINESS_UNIT,StringType,true)
StructField(AMOUNT,IntegerType,true)
StructField(PERCENTAGE,FloatType,true)
StructField(CUSTOMER_ID,StringType,true)