我有DataFrame[item: string, true_recoms: map<string,int>]
使用架构:
StructType(List(StructField(item,StringType,true),StructField(true_recoms,MapType(StringType,IntegerType,true),true)))
我想删除长度为recoms==40000
答案 0 :(得分:0)
不那么优雅,但是:
sqlContext.udf.register("stringLengthInt", lambda x: len(x), IntegerType())
train = sqlContext.sql("SELECT * FROM train HAVING len(true_recoms)<40000")
sqlContext.registerDataFrameAsTable(train, "train")
检查:
sqlContext.sql("SELECT item , stringLengthInt(true_recoms) AS l FROM train ORDER BY -l ").collect()