Pyspark不允许我创建桶。
main_nav
AttributeError Traceback(最近一次调用最后一次) in() ----> 1 df.write.bucketBy(2,“Source”)。saveAsTable(“table”)
AttributeError:'DataFrameWriter'对象没有属性'bucketBy'
答案 0 :(得分:2)
看起来def unfold (f, acc):
(x, nextAcc) = f (acc)
if nextAcc is None:
return [x]
else:
return [x] + unfold (f, nextAcc)
def fib (n):
def gen (state):
(n, a, b) = state
if n == 0:
return (a, None)
else:
return (a, (n - 1, b, a + b))
return unfold (gen, (n, 0, 1))
print (fib (20))
# [0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181, 6765]
仅在spark 2.3.0中受支持
https://spark.apache.org/docs/latest/api/python/_modules/pyspark/sql/readwriter.html#DataFrameWriter.bucketBy
您可以尝试创建新的存储桶列
bucketBy
然后使用from pyspark.ml.feature import Bucketizer
bucketizer = Bucketizer(splits=[ 0, float('Inf') ],inputCol="destination", outputCol="buckets")
df_with_buckets = bucketizer.setHandleInvalid("keep").transform(df)
partitionBy(*cols)