PySpark将名称分配给列值'withcolumn'

时间:2019-12-19 22:48:19

标签: python pyspark pyspark-sql pyspark-dataframes

我是PySaprk的新手,但是有一些R的经验。

问题:我想为一个列中列出的高度(数字)分配一个名称。我开始如下编写代码:

w = Window.partitionBy("student_id")
df_enc_hw = df_enc_hw.withColumn("stuname", \
                       when(lower(col("height")) <= 4, "under_ht") 
                      .when(lower(col("height")) > 4 < 5, "ok_ht")  
                      .when(lower(col("height")) >=5 < 6, "normal_ht")  
                      .when(lower(col("height")) >=6, "abnor_ht")) 

但是出现以下错误:

    633 
    634     def __nonzero__(self):
--> 635         raise ValueError("Cannot convert column into bool: please use '&' for 'and', '|' for 'or', "
    636                          "'~' for 'not' when building DataFrame boolean expressions.")
    637     __bool__ = __nonzero__

ValueError: Cannot convert column into bool: please use '&' for 'and', '|' for 'or', '~' for 'not' when building DataFrame boolean expressions.

谢谢您的帮助 K

1 个答案:

答案 0 :(得分:0)

您应该将条件拆分成单独的表达式,如下所示:

const overloadOne = parse(''); // unknown
const overloadOneTyped = parse<string>(''); // string
const overloadTwo = parse(null); // null
const overloadThree = parse<string>(window.localStorage.getItem('foobar')); // string | null

const json = window.localStorage.getItem('foobar'); // string | null
const value = parse<string>(json); // string | null