pyspark将数据帧写入hbase,整数值作为字节加载

时间:2018-06-21 10:05:25

标签: pyspark hbase

是否只有通过pyspark将数据帧写入hbase时才有将整数值转换为整数的选项,默认情况下,将数据帧写入hbase时默认将整数值转换为hbase表中的字节类型?

Below is the code:
    catalog2 = {
        "table": {"namespace": "default","name": "trip_test1"},
        "rowkey": "key1",
        "columns": {
        "serial_no": {"cf": "rowkey","col": "key1","type": "string"},
        "payment_type": {"cf": "sales","col": "payment_type","type":"string"},
        "fare_amount": {"cf": "sales","col": "fare_amount","type": "string"},
        "surcharge": {"cf": "sales","col": "surcharge","type": "string"},
        "mta_tax": {"cf": "sales","col": "mta_tax","type": "string"},
        "tip_amount": {"cf": "sales","col": "tip_amount","type": "string"},
        "tolls_amount": {"cf": "sales","col": "tolls_amount","type":"string"},
        "total_amount": {"cf": "sales","col": "total_amount","type": "string"}
    }
}

import json

cat2=json.dumps(catalog2)

df.write.option("catalog",cat2).option("newtable","5").format("org.apache.spark.sql.execution.datasources.hbase").save()

输出:

\x00\x00\x03\xE7 column=sales:payment_type, timestamp=1529495930994, value=CSH
\x00\x00\x03\xE7 column=sales:surcharge, timestamp=1529495930994, value=\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x03\xE7 column=sales:tip_amount, timestamp=1529495930994, value=\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x03\xE7 column=sales:tolls_amount, timestamp=1529495930994, value=\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x03\xE7 column=sales:total_amount, timestamp=1529495930994, value=@!\x00\x00\x00\x00\x00\x00
\x00\x00\x03\xE8 column=sales:fare_amount, timestamp=1529495930994, value=@\x18\x00\x00\x00\x00\x00\x00
\x00\x00\x03\xE8 column=sales:mta_tax, timestamp=1529495930994, value=?\xE0\x00\x00\x00\x00\x00\x00

预期输出:

999 column=sales:fare_amount, timestamp=1529392479358, value=8.0
999 column=sales:mta_tax, timestamp=1529392479358, value=0.5
999 column=sales:payment_type, timestamp=1529392479358, value=CSH
999 column=sales:surcharge, timestamp=1529392479358, value=0.0
999 column=sales:tip_amount, timestamp=1529392479358, value=0.0
999 column=sales:tolls_amount, timestamp=1529392479358, value=0.0
999 column=sales:total_amount, timestamp=1529392479358, value=8.5

1 个答案:

答案 0 :(得分:0)

数值将转换为字节,然后存储在Hbase中。从Hbase读取数据时,必须使用相同的库(在本例中为“ org.apache.spark.sql.execution.datasources.hbase”)来获取准确的值。

如果要将值存储为Hbase中的数字,请将列的数据类型转换为字符串类型并将其存储为库“ org.apache.spark.sql.execution.datasources.hbase”不会将字符串转换为字节

确保列值和目录类型的数据类型相同,以获得更好的结果。