我正在创建一个数据框,我将在后面的代码部分中使用它来将记录插入到hive表中,这里是代码。
/**
* Parses the it.
*
* @param nextLine the next line
* @return the item
* @throws ParseException the parse exception
*/
public Item parseIt(String[] nextLine) throws ParseException {
Item newItem = new Item();
String Id = nextLine[0];
DateFormat df = new SimpleDateFormat("yyyy-MM-dd");
// Specify your time zone
df.setTimeZone(TimeZone.getTimeZone("GMT+8:00"));
Date parsedDate = df.parse(nextLine[1]);
// Convert ms to seconds
Long dateTime = parsedDate.getTime() / 1000;
newItem.withPrimaryKey("Id", Id, "Date", dateTime);
return newItem;
}
}
===========
py4j.protocol.Py4JError: An error occurred while calling o51.__getnewargs__. Trace:
py4j.Py4JException: Method __getnewargs__([]) does not exist
at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318)
at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:326)
at py4j.Gateway.invoke(Gateway.java:272)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:214)
at java.lang.Thread.run(Thread.java:748)
答案 0 :(得分:0)
这是更新的代码
def fn_create_df_load_status():
from time import gmtime, strftime
load_start_ts = strftime("%Y-%m-%d %H:%M:%S", gmtime())
load_end_ts = strftime("%Y-%m-%d %H:%M:%S", gmtime())
print ("load_start_ts : ", load_start_ts)
print ("load_end_ts : ", load_end_ts)
data = sc.parallelize([
( ('cust_id', custid) \
, ('sys_rec', 'source system') \
, ('load_start_ts', load_start_ts) \
, ('load_end_ts', load_end_ts) \
, ('status', 'STARTED'))
])
# Convert to tuple
data_converted = data.map(lambda x: (x[0][1], x[1][1], x[2][1], x[3][1], x[4][1]))
# Define schema
schema = StructType([
StructField("cust_id", StringType(), True),
StructField("sys_rec", StringType(), True),
StructField("load_start_ts", StringType(), True),
StructField("load_end_ts", StringType(), True),
StructField("status", StringType(), True)
])
# Create dataframe
DF = spark.createDataFrame(data_converted, schema)
# Output
DF.show()