Pyspark不允许我创建桶

时间:2018-03-21 05:18:20

标签: pyspark

Pyspark不允许我创建桶。

main_nav

AttributeError Traceback(最近一次调用最后一次)  in() ----> 1 df.write.bucketBy(2,“Source”)。saveAsTable(“table”)

AttributeError:'DataFrameWriter'对象没有属性'bucketBy'

1 个答案:

答案 0 :(得分:2)

看起来def unfold (f, acc): (x, nextAcc) = f (acc) if nextAcc is None: return [x] else: return [x] + unfold (f, nextAcc) def fib (n): def gen (state): (n, a, b) = state if n == 0: return (a, None) else: return (a, (n - 1, b, a + b)) return unfold (gen, (n, 0, 1)) print (fib (20)) # [0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181, 6765] 仅在spark 2.3.0中受支持 https://spark.apache.org/docs/latest/api/python/_modules/pyspark/sql/readwriter.html#DataFrameWriter.bucketBy

您可以尝试创建新的存储桶列

bucketBy

然后使用from pyspark.ml.feature import Bucketizer bucketizer = Bucketizer(splits=[ 0, float('Inf') ],inputCol="destination", outputCol="buckets") df_with_buckets = bucketizer.setHandleInvalid("keep").transform(df)
partitionBy(*cols)