Question

Pyspark不允许我创建桶。

main_nav

AttributeError Traceback（最近一次调用最后一次） in（） ----＆GT; 1 df.write.bucketBy（2，“Source”）。saveAsTable（“table”）

AttributeError：'DataFrameWriter'对象没有属性'bucketBy'

Answer 1

看起来def unfold (f, acc): (x, nextAcc) = f (acc) if nextAcc is None: return [x] else: return [x] + unfold (f, nextAcc) def fib (n): def gen (state): (n, a, b) = state if n == 0: return (a, None) else: return (a, (n - 1, b, a + b)) return unfold (gen, (n, 0, 1)) print (fib (20)) # [0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181, 6765]仅在spark 2.3.0中受支持 https://spark.apache.org/docs/latest/api/python/_modules/pyspark/sql/readwriter.html#DataFrameWriter.bucketBy

您可以尝试创建新的存储桶列

bucketBy

然后使用from pyspark.ml.feature import Bucketizer bucketizer = Bucketizer(splits=[ 0, float('Inf') ],inputCol="destination", outputCol="buckets") df_with_buckets = bucketizer.setHandleInvalid("keep").transform(df)
partitionBy(*cols)

Pyspark不允许我创建桶

1 个答案: