我已经按键对RDD进行了采样,该键具有2个以上的Cloumens:
dfrdd= dfrdd.sampleByKey("_c0", fractions={1:0.3, 0: 0.3})
但是现在我想将其从rdd转换为数据帧:
df= dfrdd.toDF()
但是我提到了错误:
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 4.0 failed 4 times, most recent failure: Lost task 0.3 in stage 4.0 (TID 139, 172.30.48.187, executor 1): org.apache.spark.api.python.PythonException: Traceback (most recent call last):
File "/opt/ibm/spark/python/lib/pyspark.zip/pyspark/worker.py", line 377, in main
process()
File "/opt/ibm/spark/python/lib/pyspark.zip/pyspark/worker.py", line 372, in process
serializer.dump_stream(func(split_index, iterator), outfile)
File "/opt/ibm/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 393, in dump_stream
vs = list(itertools.islice(iterator, batch))
File "/opt/ibm/spark/python/pyspark/rdd.py", line 1354, in takeUpToNumLeft
yield next(iterator)
File "/opt/ibm/spark/python/pyspark/rddsampler.py", line 109, in func
for key, val in iterator:
ValueError: too many values to unpack (expected 2)```