如何在Pyspark中将数据帧的Float类型列分隔为不超过1个小数?

时间:2018-03-16 12:48:18

标签: python pyspark decimal spark-dataframe

我正在处理一个数据框,其中列'Col'的类型为Float。列的值具有太多小数(例如:1.00000000000111)。如何限制列只保存1位小数值(例如:1.0)?

3 个答案:

答案 0 :(得分:1)

您可以使用圆形函数

+----------------+
|             Col|
+----------------+
|1.00000000000111|
|     1.000000011|
+----------------+
>>> from pyspark.sql import functions as F
>>> df = df.withColumn('Col',F.round('Col',1))
>>> df.show()
+---+
|Col|
+---+
|1.0|
|1.0|
+---+

答案 1 :(得分:0)

您可以使用ceil中的floorpyspark.sql.functionsimport pyspark.sql.functions as F # assuming df is your dataframe and float_column_name is the name of the # column with type FloatType, replace the column that has floats with # the column that has rounded floats: df = df.withColumn('float_column_name', F.round('float_column_name', 2)) 功能(取决于您希望如何限制数字)

例如:

import subprocess
process_params = ['/usr/bin/tcpdump', '-n', 'dst port 80']

proc = subprocess.Popen(
    process_params,
    stdout=subprocess.PIPE, stderr=subprocess.PIPE
)

(stdout, stderr) = proc.communicate()

答案 2 :(得分:-1)

检查出来:

import pandas as pd

df = pd.DataFrame([4.5678,5,1.00000000000111], columns=['Col'])
s = df['Col'].round(1)
print(s)

0       4.6
1       5.0
2       1.0