py4j.protocol.Py4JError:调用o51 .__ getnewargs__时发生错误。跟踪:py4j.Py4JException:方法__getnewargs __([])不存在

时间:2018-05-02 14:14:42

标签: python-2.7 pyspark

我正在创建一个数据框,我将在后面的代码部分中使用它来将记录插入到hive表中,这里是代码。

我收到以下错误消息,包含此代码

/**
 * Parses the it.
 *
 * @param nextLine the next line
 * @return the item
 * @throws ParseException the parse exception
 */
public Item parseIt(String[] nextLine) throws ParseException {
    Item newItem = new Item();
    String Id = nextLine[0];

    DateFormat df = new SimpleDateFormat("yyyy-MM-dd");

    // Specify your time zone
    df.setTimeZone(TimeZone.getTimeZone("GMT+8:00")); 

    Date parsedDate = df.parse(nextLine[1]);

    // Convert ms to seconds
    Long dateTime = parsedDate.getTime() / 1000; 

    newItem.withPrimaryKey("Id", Id, "Date", dateTime);
    return newItem;
   }
 }

===========

py4j.protocol.Py4JError: An error occurred while calling o51.__getnewargs__. Trace:
py4j.Py4JException: Method __getnewargs__([]) does not exist
        at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318)
        at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:326)
        at py4j.Gateway.invoke(Gateway.java:272)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:214)
        at java.lang.Thread.run(Thread.java:748)

1 个答案:

答案 0 :(得分:0)

这是更新的代码

def fn_create_df_load_status():
   from time import gmtime, strftime

   load_start_ts = strftime("%Y-%m-%d %H:%M:%S", gmtime())
   load_end_ts = strftime("%Y-%m-%d %H:%M:%S", gmtime())

   print ("load_start_ts : ", load_start_ts)
   print ("load_end_ts : ", load_end_ts)

   data = sc.parallelize([
    ( ('cust_id', custid)     \
    , ('sys_rec', 'source system') \
    , ('load_start_ts', load_start_ts) \
    , ('load_end_ts',  load_end_ts) \
    , ('status', 'STARTED'))
    ])
# Convert to tuple
   data_converted = data.map(lambda x: (x[0][1], x[1][1], x[2][1], x[3][1], x[4][1]))

# Define schema
   schema = StructType([
       StructField("cust_id", StringType(), True),
       StructField("sys_rec", StringType(), True),
       StructField("load_start_ts", StringType(), True),
       StructField("load_end_ts", StringType(), True),
       StructField("status", StringType(), True)
    ])

# Create dataframe
   DF = spark.createDataFrame(data_converted, schema)

# Output
   DF.show()