Question

我正在处理一个数据框，其中列'Col'的类型为Float。列的值具有太多小数（例如：1.00000000000111）。如何限制列只保存1位小数值（例如：1.0）？

Answer 1

您可以使用圆形函数

+----------------+
|             Col|
+----------------+
|1.00000000000111|
|     1.000000011|
+----------------+
>>> from pyspark.sql import functions as F
>>> df = df.withColumn('Col',F.round('Col',1))
>>> df.show()
+---+
|Col|
+---+
|1.0|
|1.0|
+---+

Answer 2

您可以使用ceil中的floor，pyspark.sql.functions或import pyspark.sql.functions as F # assuming df is your dataframe and float_column_name is the name of the # column with type FloatType, replace the column that has floats with # the column that has rounded floats: df = df.withColumn('float_column_name', F.round('float_column_name', 2))功能（取决于您希望如何限制数字）

例如：

import subprocess
process_params = ['/usr/bin/tcpdump', '-n', 'dst port 80']

proc = subprocess.Popen(
    process_params,
    stdout=subprocess.PIPE, stderr=subprocess.PIPE
)

(stdout, stderr) = proc.communicate()

Answer 3

检查出来：

import pandas as pd

df = pd.DataFrame([4.5678,5,1.00000000000111], columns=['Col'])
s = df['Col'].round(1)
print(s)

0       4.6
1       5.0
2       1.0

如何在Pyspark中将数据帧的Float类型列分隔为不超过1个小数？

3 个答案: