是否只有通过pyspark将数据帧写入hbase时才有将整数值转换为整数的选项,默认情况下,将数据帧写入hbase时默认将整数值转换为hbase表中的字节类型?
Below is the code:
catalog2 = {
"table": {"namespace": "default","name": "trip_test1"},
"rowkey": "key1",
"columns": {
"serial_no": {"cf": "rowkey","col": "key1","type": "string"},
"payment_type": {"cf": "sales","col": "payment_type","type":"string"},
"fare_amount": {"cf": "sales","col": "fare_amount","type": "string"},
"surcharge": {"cf": "sales","col": "surcharge","type": "string"},
"mta_tax": {"cf": "sales","col": "mta_tax","type": "string"},
"tip_amount": {"cf": "sales","col": "tip_amount","type": "string"},
"tolls_amount": {"cf": "sales","col": "tolls_amount","type":"string"},
"total_amount": {"cf": "sales","col": "total_amount","type": "string"}
}
}
import json
cat2=json.dumps(catalog2)
df.write.option("catalog",cat2).option("newtable","5").format("org.apache.spark.sql.execution.datasources.hbase").save()
输出:
\x00\x00\x03\xE7 column=sales:payment_type, timestamp=1529495930994, value=CSH
\x00\x00\x03\xE7 column=sales:surcharge, timestamp=1529495930994, value=\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x03\xE7 column=sales:tip_amount, timestamp=1529495930994, value=\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x03\xE7 column=sales:tolls_amount, timestamp=1529495930994, value=\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x03\xE7 column=sales:total_amount, timestamp=1529495930994, value=@!\x00\x00\x00\x00\x00\x00
\x00\x00\x03\xE8 column=sales:fare_amount, timestamp=1529495930994, value=@\x18\x00\x00\x00\x00\x00\x00
\x00\x00\x03\xE8 column=sales:mta_tax, timestamp=1529495930994, value=?\xE0\x00\x00\x00\x00\x00\x00
预期输出:
999 column=sales:fare_amount, timestamp=1529392479358, value=8.0
999 column=sales:mta_tax, timestamp=1529392479358, value=0.5
999 column=sales:payment_type, timestamp=1529392479358, value=CSH
999 column=sales:surcharge, timestamp=1529392479358, value=0.0
999 column=sales:tip_amount, timestamp=1529392479358, value=0.0
999 column=sales:tolls_amount, timestamp=1529392479358, value=0.0
999 column=sales:total_amount, timestamp=1529392479358, value=8.5
答案 0 :(得分:0)
数值将转换为字节,然后存储在Hbase中。从Hbase读取数据时,必须使用相同的库(在本例中为“ org.apache.spark.sql.execution.datasources.hbase”)来获取准确的值。
如果要将值存储为Hbase中的数字,请将列的数据类型转换为字符串类型并将其存储为库“ org.apache.spark.sql.execution.datasources.hbase”不会将字符串转换为字节
确保列值和目录类型的数据类型相同,以获得更好的结果。