如何在pySpark中执行createOrReplaceTempView后维护列的数据类型?

时间:2017-05-23 02:37:56

标签: pyspark spark-dataframe pyspark-sql

我有一个数据框,其数据类型可以在下面看到

orders.printSchema()
root
 |-- order_id: long (nullable = true)
 |-- user_id: long (nullable = true)
 |-- eval_set: string (nullable = true)
 |-- order_number: short (nullable = true)
 |-- order_dow: short (nullable = true)
 |-- order_hour_of_day: short (nullable = true)
 |-- days_since_prior_order: short (nullable = true)

但是当我将它注册到表时,数据类型都变为string。

orders.createOrReplaceTempView("orders")
spark.sql("describe orders").show()
+--------------------+---------+-------+
|            col_name|data_type|comment|
+--------------------+---------+-------+
|            order_id|   string|       |
|             user_id|   string|       |
|            eval_set|   string|       |
|        order_number|   string|       |
|           order_dow|   string|       |
|   order_hour_of_day|   string|       |
|days_since_prior_...|   string|       |
+--------------------+---------+-------+

那么如何在pyspark中将原始类型从数据帧维护到表。

1 个答案:

答案 0 :(得分:0)

createOrReplaceTempView不会更改架构。我已经在Spark Scala中进行了测试,它保留了schema。这可能是pyspark

的问题