Spark数据帧小数精度

时间:2017-08-15 03:30:02

标签: apache-spark apache-spark-sql spark-dataframe

我有一个数据框:

val groupby = df.groupBy($"column1",$"Date")    
.agg(sum("amount").as("amount"))
.orderBy($"column1",desc("cob_date"))

当应用窗口函数添加新的列差异时:

val windowspec= Window.partitionBy("column1").orderBy(desc("DATE"))

groupby.withColumn("diffrence" ,lead($"amount", 1,0).over(windowspec)).show()


+--------+------------+-----------+--------------------------+
| Column | Date       | Amount    | Difference               |
+--------+------------+-----------+--------------------------+
| A      | 3/31/2017  | 12345.45  | 3456.540000000000000000  |
+--------+------------+-----------+--------------------------+
| A      | 2/28/2017  | 3456.54   | 34289.430000000000000000 |
+--------+------------+-----------+--------------------------+
| A      | 1/31/2017  | 34289.43  | 45673.987000000000000000 |
+--------+------------+-----------+--------------------------+
| A      | 12/31/2016 | 45673.987 | 0.00E+00                 |
+--------+------------+-----------+--------------------------+

我得到十进制的尾随零。当我为上面的数据帧printSchema()获取差异的数据类型:decimal(38,18)。有人可以告诉我如何将数据类型更改为{{ 1}}或删除尾随零

3 个答案:

答案 0 :(得分:2)

您可以使用特定的十进制大小来转换数据,如下所示

lead($"amount", 1,0).over(windowspec).cast(DataTypes.createDecimalType(32,2))

答案 1 :(得分:0)

在纯SQL中,您可以使用众所周知的技术:

SELECT ceil(100 * column_name_double)/100 AS cost ...

答案 2 :(得分:0)

from pyspark.sql.types import DecimalType
df=df.withColumn(column_name, df[column_name].cast(DecimalType(10,2)))