Question

我正在将Pandas数据框t的内容写入Pyspark中的Hive表。

t有一列Request_time_local pandas.tslib.Timestamp列

In: print t.loc[0,'Request_time_local']
Out: 2016-12-09 13:01:27

Hive表格中有一个request_time_local类型的列timestamp：

col_name              | data_type
request_time_local    | timestamp

我将t转换为Pyspark dataframe以便写入Hive：

t_rdd = spark.createDataFrame(t)
t_rdd.registerTempTable("temp_result")

我的表格中未填充request_time_local列，但其他所有列都已填充。

转换为Pyspark dataframe后，request_time_local为bigint unix时间戳：

spark.createDataFrame(t)
DataFrame[request_time_local: bigint, ...]

我通过将Pyspark dataframe转换回pandas来检查这一点。

t_check = t_rdd.toPandas()
In: print t_check.loc[0,'Request_time_local']
Out: 1481288487000000000

我想知道：

1）request_time_local无法填充，因为我正在从Hive表格列中的bigint到Pyspark dataframe写timestamp吗？

2）有没有办法在timestamp中保留Pyspark dataframe类型以与Hive表列类型兼容？

（我意识到这里的一个解决方案是将Hive列更改为int并编写unix时间戳。）

Answer 1

您可以尝试：

response