我使用spark来读取CSV文件,csv中的字段值之一是91520122094491671D
阅读后,该值为9.152012209449166...
我发现如果一个字符串以数字开头并以D / F结尾,那就是结果
但我需要将数据作为字符串读取
那我该怎么办?
这是CSV文件数据。
tax_file_code| cus_name| tax_identification_number
T19915201| 息烽家吉装饰材料店| 91520122094491671D
Scala代码如下:
sparkSession.read.format("com.databricks.spark.csv")
.option("header", "true")
.option("inferSchema", true.toString)
.load(getHadoopUri(uri))
.createOrReplaceTempView("t_datacent_cus_temp_guizhou_ds_tmp")
sparkSession.sql(
s"""
| select cast(tax_file_code as String) as tax_file_code,
| cus_name,
| cast(tax_identification_number as String) as tax_identification_number
| from t_datacent_cus_temp_guizhou_ds_tmp
""".stripMargin).createOrReplaceTempView("t_datacent_cus_temp_guizhou_ds")
sparkSession.sql("select * from t_datacent_cus_temp_guizhou_ds").show
结果如下所示。
+-----------------+-----------------+-------------------------+
|tax_file_code | cus_name |tax_identification_number|
+-----------------+-----------------+-------------------------+
| T19915201 |息烽家吉装饰材料店 | 9.152012209449166...|
+-----------------+-----------------+-------------------------+
答案 0 :(得分:0)
你可以尝试:
sparkSession.sql("select * from t_datacent_cus_temp_guizhou_ds").show(20, False)
将truncate设置为false。如果为true,则超过20个字符的字符串将 被截断,所有单元格将对齐
编辑:
val x = sparkSession.read
.option("header", "true")
.option("inferSchema", "true")
.csv("....src/main/resources/data.csv")
x.printSchema()
x.createOrReplaceTempView("t_datacent_cus_temp_guizhou_ds_tmp")
sparkSession.sql(
s"""
| select cast(tax_file_code as String) as tax_file_code,
| cus_name,
| cast(tax_identification_number as String) as tax_identification_number
| from t_datacent_cus_temp_guizhou_ds_tmp
""".stripMargin).createOrReplaceTempView("t_datacent_cus_temp_guizhou_ds")
sparkSession.sql("select * from t_datacent_cus_temp_guizhou_ds").show(truncate = false)
这将输出为:
+-------------+----------+-------------------------+
|tax_file_code|cus_name |tax_identification_number|
+-------------+----------+-------------------------+
|T19915201 | 息烽家吉装饰材料店|9.1520122094491664E16 |
+-------------+----------+-------------------------+
答案 1 :(得分:0)
听起来像尾随的D / F正在将架构解释器设置为double或float,并且列被截断,因此您将看到指数值
如果您希望所有列都是字符串,请删除
option("inferSchema", true.toString)