我尝试使用此代码将列从字符串转换为时间戳记
from pyspark.sql import SparkSession
from pyspark.sql.functions import sum, avg, count, first, col
from pyspark.sql.functions import unix_timestamp, col, datediff
import csv
if __name__ == "__main__":
spark = SparkSession \
.builder \
.config("spark.some.config.option", "some-value") \
.getOrCreate()
df=spark.read.csv("in\SETP11.csv")
df = df.selectExpr('_c0 as USER_ID', '_c1 as timestamp', '_c2 as location')
df.show()
df = df.withColumn('timestamp', unix_timestamp('timestamp', 'yyyy-mm-dd hh:mm:ss'))
df.show()
但是我得到空值(该函数只为某些返回值)
+-------+---------+--------+
|USER_ID|timestamp|location|
+-------+---------+--------+
| 1| null| 321|
| 1|1359262.2| 386|
| 1|1359271.2| 386|
| 1|1359273.6| 386|
| 1|1359273.6| 386|
| 1|1359274.8| 386|
| 1|1359281.4| 386|
| 1|1359282.0| 386|
| 1| null| 386|
| 1| null| 386|
| 1| null| 386|
| 1| null| 386|
+-------+---------+--------+
该如何解决?问题在哪里?
关键字:python apache-spark pyspark unix-timestamp