pyspark-尝试从Vector Dense对象获取最大值时出错

时间:2018-10-09 13:30:32

标签: python dataframe pyspark vectorization naivebayes

我正在尝试使用朴素贝叶斯(Naive Bayes)获得一些预测,但我想我做到了,但是结果来自pyspark.ml.linalg.DenseVector,因此必须使用UDF从中提取最大值预测。但是,当我尝试执行此操作时,我收到一条错误消息,我可以很好地理解。我怀疑这是数据中的东西,但是很难说。因此,如果可能,我需要在这里澄清。

预测后,最终的数据帧

final.select(['probability']).show(1,False)

打印(删除了许多不必要的字符)

  

[0.003757082491927337,0.006619699990962011,4.3419224521402993E-4,5.372777916422481E-4,0.008699903117263496,0.00921345642395094,0.009308914882494697,0.009532505240756398,0.009574215735530826,0.009942975085100142,0.003746443859484387,0.0017932735490852503,0.19945386664486442,0.005124182867179237,0.005089059078247119,0.007687079453881513,0.0025867216196684335,0.006587085820814359,0.0046586814382427715,0.003472875972621646, 0.003472875972621646,0.0019978351753593363,0.07453890301188439,0.04191544981001385,0.003472875972621646,0.006060177344013417,0.0044178878429373845,0.005477673073302646,0.0026645750909332056,0.005477673073302646,0.0033605681622028913,0.0017607165865065513,0.005164218768408131,0.03186193509859075,0.003202843488791911,0.0021707007624092475,0.0289259217515303,0.002170547482888538,0.003092523571613203,0.003717084552951576,0.003092523571613203,0.003717084552951576,0.003092523571613203,0.0025850524324637575,0.003092523571613203, 0.0025180433769619766,0.004128049967861509,0.00 2957690684832795,0.002424681067867312,0.0032320945834738157,0.003748651440639081,0.002424681067867312,0.0027953016195932626,0.0026019591870661144,0.0026019591870661144,0.0026019591870661144,0.001609746125199844,0.0023004260882032836,0.0026019591870661144,0.002637381851293657,0.002373870061102953,0.011592795566489985,0.002140021680448195,0.014243220366617711,0.002373870061102953,0.012840130082689164,0.002373870061102953,0.002140021680448195,0.0021067973358856397,0.0021067973358856397,0.0021067973358856397,0.0017833514003735016, 0.00249904614224033,0.00249904614224033,0.0021067973358856397,0.00249904614224033,0.0021067973358856397,0.0013989129872010988,0.00249904614224033,0.009686890912962319,0.0021067973358856397,0.00249904614224033,0.00249904614224033,0.0016431614915737756,0.0017960070941398095,0.022362412956746066,0.0014868338211806395,0.0016854378687085135,0.0017960070941398095,0.0016854378687085135,0.0014868338211806395,0.006741751474834078,0.0017960070941398095,0.0017960070941398095 ,0.001914944241750504,0.0017960070941398095,0.0017960070941398095,0.007659776967002045,0.0017960070941398095,0.007659776967002045,0.001914944241750504,0.0014868338211806395,0.0017960070941398095,0.001914944241750504,0.0017960070941398095,0.0017960070941398095,0.0017960070941398095,0.007659776967002045,0.002042968531509297,0.005590603239186536,0.0015825800407353087,0.0017960070941398095,0.0017960070941398095,0.0015825800407353087,0.006741751474834078,0.007659776967002045,0.001164670359560697,0.002042968531509297,0.0013760166559538393 ,0.0014994276853441905,0.0014362081813128692,0.0037922352045941772,0.0014362081813128692,0.0014362081813128692,0.0014362081813128692,0.0014362081813128692,0.0013760166559538393,0.00121203546880268,0.0014362081813128692,0.0014362081813128692,0.0013760166559538393,0.0014362081813128692,0.0014362081813128692,0.0014362081813128692,0.004308624543938659,0.001318690925646826,0.0014994276853441905,0.0015658476142623823,0.0014362081813128692,0.0013760166559538393,0.0 010269702301250521,0.0014362081813128692,0.004128049967861509,0.0014362081813128692,0.001162426909555469,0.0014362081813128692,0.0014362081813128692,0.004128049967861509,0.0015658476142623823,0.0014994276853441905,0.0013760166559538393,0.0013760166559538393,0.0013760166559538393,0.0013760166559538393,0.001318690925646826,0.001956571875734274,9.996184568961312E-4,0.0019992369137922664,9.574721208752502E-4,0.001914944241750504,0.001914944241750504,0.0010668764063125208, 0.0019992369137922664,0.0020877968190164925,0.0010668764063125208,0.0010214842657546466,0.0010214842657546466,0.0010668764063125208,9.574721208752502E-4,0.0010214842657546466,0.0010214842657546466,0.0019992369137922664,0.0010214842657546466,0.0010438984095082593,0.0010214842657546466,9.782859378671353E-4,0.0010214842657546466,9.782859378671353E-4,0.0010214842657546466,9.996184568961312E-4,0.0018743257203195504, 9.996184568961312E-4,0.001956571875734274,0.0010214842657546466,9.996184568961312E-4,0.0010214842657546466,9.99618 4568961312E-4,0.0019992369137922664,0.0010214842657546466,0.001914944241750504,0.0010214842657546466,0.0010214842657546466,0.0020877968190164925,0.0010214842657546466,9.782859378671353E-4,0.0010214842657546466,0.00102148426575464660.000.0014147657547466547542547547469454754754767546545476945475476546945476754654547547654694547654694547654694547654466654679547472654675654679465472654654679654547654679547472654654654679654547654654654679547654654654679547654654654679474654654679547654654654679654654679547654654654679654547654654654547654654654654654654654654654654654654654654654654654654654654654654654654654654654654654654654654654654654654654654654654654654654654654654654654654654654654654654654654654654654654654654654654素特制的的朋友可以使用0.0020429685315092824,0.0010668764063125208,0.0010214842657546466,9.782859378671353E-4,0.0010214842657546466,9.996184568961312E-4,9.574721208752502E-4,9.782859378671353E-4,0.0010214842657546466,0.00102148426575464660.000.0014148428424664660.000.0014148425475464660.000.0014148426575464660.000.0014148425475464660.000.0014148426575464660.000.0014 9.996184568961312E-4,0.0010214842657546466,0.0010214842657546466,0.0010214842657546466,0.0010214842657546466,0.0010214842657546466,0.0019992369137922664,0.0010214842657546466,9.782859378671353E-4,0.0010214842657546466,0.0019992369137922664,9.574721208752484E 2657546466,9.996184568961312E-4,9.782859378671353E-4,0.0019992369137922664,9.996184568961312E-4,9.996184568961312E-4,0.0010214842657546466,9.574721208752502E-4,0.0018743257203195504,0.0010214842657546466,9.996184568961312E-4,0.00199923691234106 ,0.0010438984095082593,0.0019992369137922664,0.003913143751468528,0.0010214842657546466,9.996184568961312E-4,0.0010214842657546466,0.0010214842657546466,0.0018743257203195504,0.0010214842657546466,0.0010214842657546466,0.0010214842657546466]

使用UDF获取最大值

def max_binarizer(vector):
  max_val = float(max(vector))
  return max_val
max_bin_udf = F.udf(max_binarizer, FloatType())

稍后...

final = final.withColumn("PROB", max_bin_udf(final['probability']))
final.show(1)

错误消息:

Py4JJavaError: An error occurred while calling o956.showString.
: org.apache.spark.SparkException: Job aborted due to stage failure: 
Task 0 in stage 105.0 failed 1 times, most recent failure: Lost task 0.0 
in stage 105.0 (TID 6266, localhost, executor driver): 
org.apache.spark.api.python.PythonException: Traceback (most recent call 
last):
File "/usr/local/spark/python/lib/pyspark.zip/pyspark/worker.py", line 
177, in main
process()
File "/usr/local/spark/python/lib/pyspark.zip/pyspark/worker.py", line 
172, in process
serializer.dump_stream(func(split_index, iterator), outfile)
File "/usr/local/spark/python/lib/pyspark.zip/pyspark/worker.py", line 
104, in <lambda>
func = lambda _, it: map(mapper, it)
File "<string>", line 1, in <lambda>
File "/usr/local/spark/python/lib/pyspark.zip/pyspark/worker.py", line 
71, in <lambda>
return lambda *a: f(*a)
File "<ipython-input-73-794830053905>", line 3, in max_binarizer
File "/usr/local/spark/python/lib/pyspark.zip/pyspark/sql/functions.py", 
line 40, in _
jc = getattr(sc._jvm.functions, name)(col._jc if isinstance(col, Column) 
else col)
AttributeError: 'NoneType' object has no attribute '_jvm'

我检查UDF中'vector'参数的类型。是pyspark.ml.linalg.DenseVector

我试图通过类似的问题来解决。其中之一表明Python函数与pySpark函数之间存在冲突。不幸的是结果是一样的!请帮帮我!

0 个答案:

没有答案