PySpark错误:Py4JJavaError:调用o469.count时发生错误

时间:2016-02-28 20:29:05

标签: pyspark apache-spark-sql spark-dataframe

我有一个python spark程序,它表现不一致,在某些情况下会出错 我经常使用c3.2xlarge两个m1.large的小型EMR集群运行,它运行正常并且竞争成功。
但是,当我在较大的群集上运行完全相同的程序时 - 我尝试使用c3.2xlarge主机4 m1.large,它以错误结束。我将粘贴下面的错误,但即使这些错误也不是一致的,不是错误跟踪本身,也不是发生错误的阶段。
例如,在一个案例中,它发生在大约26分钟后和.count()调用中,并且在另一个实例中它实际上成功通过.count(),但是在大约一个小时之后并且在不同阶段发生错误,致电.write.jdbc() 所以我认为这是一种竞争条件,但我甚至不确定这是不是因为火花使用不当造成的,或者这是火花中的错误。
我在这种情况下使用的大部分功能都来自spark.sql。

环境:EMR上的Spark 1.5.2(AWS上的弹性Mapreduce)

堆栈痕迹很长,所以我不能在这里粘贴整个痕迹,但希望能够获得上下文。
代码本身 - 嗯,那里有很多,我没有设法找到一个简单的repro测试用例,我可以轻松地在这里发布...(竞争条件,你知道......)

如上所述,这只是堆栈跟踪的一部分,它变得非常长(例一):
请注意,两种情况下的错误都发生在代码的不同位置。

如何解决此问题的任何帮助或指示?

欢呼声

Traceback (most recent call last):
  File "/home/hadoop/rantav.spark_normalize_data.py.134231/spark_normalize_data.py", line 102, in <module>
    run_spark(args)
  File "/home/hadoop/rantav.spark_normalize_data.py.134231/spark_normalize_data.py", line 62, in run_spark
    company_aliases_broadcast, experiences, args)
  File "/home/hadoop/rantav.spark_normalize_data.py.134231/companies.py", line 50, in get_companies
    out(sc, args, 'companies', sql_create_table, companies)
  File "/home/hadoop/rantav.spark_normalize_data.py.134231/output.py", line 48, in out
    mode='append')
  File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 455, in jdbc
  File "/usr/lib/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line 538, in __call__
  File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 36, in deco
  File "/usr/lib/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", line 300, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o464.jdbc.
: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree:
TungstenExchange hashpartitioning(ref_company_size_id#96)
 TungstenProject [id#85,href#86,name#87,emp_count#94L,emp_count_percentile#175,ref_company_size_id#96,growth_rate#191,growth_rate_percentile#193,avg_tenure#196,avg_tenure_percentile#197,churn_rate#200,churn_rate_percentile#202,retention_rate_2y#210]
  SortMergeOuterJoin [id#85], [company_id#180], LeftOuter, None
   TungstenSort [id#85 ASC], false, 0
    TungstenExchange hashpartitioning(id#85)
     TungstenProject [id#85,href#86,name#87,emp_count#94L,emp_count_percentile#175,ref_company_size_id#96,growth_rate#191,growth_rate_percentile#193,avg_tenure#196,avg_tenure_percentile#197,churn_rate#200,(_we0#203 * 100.0) AS churn_rate_percentile#202]
      Window [id#85,href#86,name#87,emp_count#94L,emp_count_percentile#175,ref_company_size_id#96,growth_rate#191,growth_rate_percentile#193,avg_tenure#196,avg_tenure_percentile#197,churn_rate#200], [HiveWindowFunction#org.apache.hadoop.hive.ql.udf.generic.GenericUDAFPercentRank(churn_rate#200) WindowSpecDefinition ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING AS _we0#203], [ref_company_size_id#96], [churn_rate#200 ASC]
       TungstenSort [ref_company_size_id#96 ASC,churn_rate#200 ASC], false, 0
        TungstenExchange hashpartitioning(ref_company_size_id#96)
         TungstenProject [id#85,href#86,name#87,emp_count#94L,emp_count_percentile#175,ref_company_size_id#96,growth_rate#191,growth_rate_percentile#193,avg_tenure#196,avg_tenure_percentile#197,(100.0 * cast(pythonUDF#201 as double)) AS churn_rate#200]
          !BatchPythonEvaluation PythonUDF#divide(count#199L,emp_count#94L), [id#85,href#86,name#87,emp_count#94L,emp_count_percentile#175,ref_company_size_id#96,growth_rate#191,growth_rate_percentile#193,avg_tenure#196,avg_tenure_percentile#197,count#199L,pythonUDF#201]
           ConvertToSafe
            TungstenProject [id#85,href#86,name#87,emp_count#94L,emp_count_percentile#175,ref_company_size_id#96,growth_rate#191,growth_rate_percentile#193,avg_tenure#196,avg_tenure_percentile#197,count#199L]
             SortMergeOuterJoin [id#85], [company_id#180], LeftOuter, None
              TungstenSort [id#85 ASC], false, 0
               TungstenExchange hashpartitioning(id#85)
                TungstenProject [id#85,href#86,name#87,emp_count#94L,emp_count_percentile#175,ref_company_size_id#96,growth_rate#191,growth_rate_percentile#193,avg_tenure#196,(_we0#198 * 100.0) AS avg_tenure_percentile#197]
                 Window [id#85,href#86,name#87,emp_count#94L,emp_count_percentile#175,ref_company_size_id#96,growth_rate#191,growth_rate_percentile#193,avg_tenure#196], [HiveWindowFunction#org.apache.hadoop.hive.ql.udf.generic.GenericUDAFPercentRank(avg_tenure#196) WindowSpecDefinition ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING AS _we0#198], [ref_company_size_id#96], [avg_tenure#196 ASC]
                  TungstenSort [ref_company_size_id#96 ASC,avg_tenure#196 ASC], false, 0
                   TungstenExchange hashpartitioning(ref_company_size_id#96)
                    TungstenProject [id#85,href#86,name#87,emp_count#94L,emp_count_percentile#175,ref_company_size_id#96,growth_rate#191,growth_rate_percentile#193,avg(duration_days)#195 AS avg_tenure#196]
                     SortMergeOuterJoin [id#85], [company_id#180], LeftOuter, None
                      TungstenSort [id#85 ASC], false, 0
                       TungstenExchange hashpartitioning(id#85)
                        TungstenProject [id#85,href#86,name#87,emp_count#94L,emp_count_percentile#175,ref_company_size_id#96,growth_rate#191,(_we0#194 * 100.0) AS growth_rate_percentile#193]
                         Window [id#85,href#86,name#87,emp_count#94L,emp_count_percentile#175,ref_company_size_id#96,growth_rate#191], [HiveWindowFunction#org.apache.hadoop.hive.ql.udf.generic.GenericUDAFPercentRank(growth_rate#191) WindowSpecDefinition ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING AS _we0#194], [ref_company_size_id#96], [growth_rate#191 ASC]
                          TungstenSort [ref_company_size_id#96 ASC,growth_rate#191 ASC], false, 0
                           TungstenExchange hashpartitioning(ref_company_size_id#96)
                            TungstenProject [id#85,href#86,name#87,emp_count#94L,emp_count_percentile#175,ref_company_size_id#96,(100.0 * cast(pythonUDF#192 as double)) AS growth_rate#191]
                             !BatchPythonEvaluation PythonUDF#divide(count#190L,emp_count#94L), [id#85,href#86,name#87,emp_count#94L,emp_count_percentile#175,ref_company_size_id#96,count#190L,pythonUDF#192]
                              ConvertToSafe
                               TungstenProject [id#85,href#86,name#87,emp_count#94L,emp_count_percentile#175,ref_company_size_id#96,count#190L]
                                SortMergeOuterJoin [id#85], [company_id#180], LeftOuter, None
                                 TungstenSort [id#85 ASC], false, 0
                                  TungstenExchange hashpartitioning(id#85)
                                   TungstenProject [id#85,href#86,name#87,emp_count#94L,(_we0#176 * 100.0) AS emp_count_percentile#175,ref_company_size_id#96]
                                    Window [id#85,href#86,name#87,emp_count#94L,ref_company_size_id#96], [HiveWindowFunction#org.apache.hadoop.hive.ql.udf.generic.GenericUDAFPercentRank(emp_count#94L) WindowSpecDefinition ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING AS _we0#176], [ref_company_size_id#96], [emp_count#94L ASC]
                                     TungstenSort [ref_company_size_id#96 ASC,emp_count#94L ASC], false, 0

这是另一个不同堆栈跟踪的例子(相同的程序,相同数量的工人):

  /home/hadoop/rantav.spark_normalize_data.py.093920/pymysql/cursors.py:146: Warning: Can't create database 'v2'; database exists
    result = self._query(query)
  /home/hadoop/rantav.spark_normalize_data.py.093920/pymysql/cursors.py:146: Warning: Table 'oplog' already exists
    result = self._query(query)
  Traceback (most recent call last):
    File "/home/hadoop/rantav.spark_normalize_data.py.093920/spark_normalize_data.py", line 102, in <module>
      run_spark(args)
    File "/home/hadoop/rantav.spark_normalize_data.py.093920/spark_normalize_data.py", line 62, in run_spark
      company_aliases_broadcast, experiences, args)
    File "/home/hadoop/rantav.spark_normalize_data.py.093920/companies.py", line 50, in get_companies
      out(sc, args, 'companies', sql_create_table, companies)
    File "/home/hadoop/rantav.spark_normalize_data.py.093920/output.py", line 35, in out
      if data.count() > 0:
    File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/dataframe.py", line 268, in count
    File "/usr/lib/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line 538, in __call__
    File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 36, in deco
    File "/usr/lib/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", line 300, in get_return_value
  py4j.protocol.Py4JJavaError: An error occurred while calling o469.count.
  : org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree:
  TungstenAggregate(key=[], functions=[(count(1),mode=Final,isDistinct=false)], output=[count#216L])
   TungstenExchange SinglePartition
    TungstenAggregate(key=[], functions=[(count(1),mode=Partial,isDistinct=false)], output=[currentCount#219L])
     TungstenProject
      Window [id#85,href#86,name#87,emp_count#94L,emp_count_percentile#175,ref_company_size_id#96,growth_rate#191,growth_rate_percentile#193,avg_tenure#196,avg_tenure_percentile#197,churn_rate#200,churn_rate_percentile#202,retention_rate_2y#210], [HiveWindowFunction#org.apache.hadoop.hive.ql.udf.generic.GenericUDAFPercentRank(retention_rate_2y#210) WindowSpecDefinition ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING AS _we0#213], [ref_company_size_id#96], [retention_rate_2y#210 ASC]
       TungstenSort [ref_company_size_id#96 ASC,retention_rate_2y#210 ASC], false, 0
        TungstenExchange hashpartitioning(ref_company_size_id#96)
         TungstenProject [id#85,href#86,name#87,emp_count#94L,emp_count_percentile#175,ref_company_size_id#96,growth_rate#191,growth_rate_percentile#193,avg_tenure#196,avg_tenure_percentile#197,churn_rate#200,churn_rate_percentile#202,retention_rate_2y#210]
          SortMergeOuterJoin [id#85], [company_id#180], LeftOuter, None
           TungstenSort [id#85 ASC], false, 0
            TungstenExchange hashpartitioning(id#85)
             TungstenProject [id#85,href#86,name#87,emp_count#94L,emp_count_percentile#175,ref_company_size_id#96,growth_rate#191,growth_rate_percentile#193,avg_tenure#196,avg_tenure_percentile#197,churn_rate#200,(_we0#203 * 100.0) AS churn_rate_percentile#202]
              Window [id#85,href#86,name#87,emp_count#94L,emp_count_percentile#175,ref_company_size_id#96,growth_rate#191,growth_rate_percentile#193,avg_tenure#196,avg_tenure_percentile#197,churn_rate#200], [HiveWindowFunction#org.apache.hadoop.hive.ql.udf.generic.GenericUDAFPercentRank(churn_rate#200) WindowSpecDefinition ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING AS _we0#203], [ref_company_size_id#96], [churn_rate#200 ASC]
               TungstenSort [ref_company_size_id#96 ASC,churn_rate#200 ASC], false, 0
                TungstenExchange hashpartitioning(ref_company_size_id#96)
                 TungstenProject [id#85,href#86,name#87,emp_count#94L,emp_count_percentile#175,ref_company_size_id#96,growth_rate#191,growth_rate_percentile#193,avg_tenure#196,avg_tenure_percentile#197,(100.0 * cast(pythonUDF#201 as double)) AS churn_rate#200]
                  !BatchPythonEvaluation PythonUDF#divide(count#199L,emp_count#94L), [id#85,href#86,name#87,emp_count#94L,emp_count_percentile#175,ref_company_size_id#96,growth_rate#191,growth_rate_percentile#193,avg_tenure#196,avg_tenure_percentile#197,count#199L,pythonUDF#201]
                   ConvertToSafe
                    TungstenProject [id#85,href#86,name#87,emp_count#94L,emp_count_percentile#175,ref_company_size_id#96,growth_rate#191,growth_rate_percentile#193,avg_tenure#196,avg_tenure_percentile#197,count#199L]
                     SortMergeOuterJoin [id#85], [company_id#180], LeftOuter, None
                      TungstenSort [id#85 ASC], false, 0
                       TungstenExchange hashpartitioning(id#85)

0 个答案:

没有答案