Question

我试图在数据块中使用round（）函数将一些浮点值四舍五入到2位数字。但是，databricks python不能像普通python一样工作。

请帮助我提供原因和解决方案。

lis = [-12.1334, 12.23433, 1.2343, -104.444]
lis2 = [round(val,2)  for val in lis]
print(lis2)


TypeError: Invalid argument, not a string or column: -12.1334 of type <type 'float'>. For column literals, use 'lit', 'array', 'struct' or 'create_map' function.

Image Proof of Code

Answer 1

早上好，问题很可能与名称空间冲突有关。我跑了类似的

from pyspark.sql.functions import *

其中包含回合功能。您可以通过运行帮助来轻松查看正在使用的回合：

help(round)

一个简单的解决方法是将pyspark函数指定给不同的名称空间。

import pyspark.sql.functions as F
lis = [-12.1334, 12.23433, 1.2343, -104.444]
lis2 = [round(val,2)  for val in lis]
print(lis2)

[-12.13, 12.23, 1.23, -104.44]

Answer 2

尝试一下：

lis = [-12.1334, 12.23433, 1.2343, -104.444]
list_em = []
for row in lis:
    list_em.append(round(row,2))
print(list_em)

[-12.13, 12.23, 1.23, -104.44]

Answer 3

我相信这是您正在应用的功能的源代码：

def round(col, scale=0):
    """
    Round the given value to `scale` decimal places using HALF_UP rounding mode if `scale` >= 0
    or at integral part when `scale` < 0.

    >>> spark.createDataFrame([(2.5,)], ['a']).select(round('a', 0).alias('r')).collect()
    [Row(r=3.0)]
    """
    sc = SparkContext._active_spark_context
    return Column(sc._jvm.functions.round(_to_java_column(col), scale))

很明显，它说传递一个列，而不是十进制数。您输入*了吗？那可能会覆盖内置函数。

Answer 4

仅当从spark.sql中的function模块导入spark round函数时，此方法才可复制

spark round函数需要一个字符串或一列。解释了错误。

您可以为导入加上别名，例如import pyspark.sql.functions as F而不是from pyspark.sql.functions import *

您可以通过这种方式获取原点round方法。

import builtins
round = getattr(builtins, "round")

然后您可以执行

lis = [-12.1334, 12.23433, 1.2343, -104.444]
lis2 = [round(val,2)  for val in lis]
print(lis2)

round（）函数不适用于数据块-Python

4 个答案: