我想添加一列来计算两个两个时间戳值之间的时间差。为此,我首先添加一列当前日期时间,在此处将其定义为current_datetime
:
import datetime
#define current datetime
now = datetime.datetime.now()
#Getting Current date and time
current_datetime=now.strftime("%Y-%m-%d %H:%M:%S")
print(now)
然后我想将current_datetime
作为列值添加到df并计算差异
import pyspark.sql.functions as F
productsDF = productsDF\
.withColumn('current_time', when(col('Quantity')>1, current_datetime))\
.withColumn('time_diff',\
(F.unix_timestamp(F.to_timestamp(F.col('current_time')))) -
(F.unix_timestamp(F.to_timestamp(F.col('Created_datetime'))))/F.lit(3600)
)
但是输出仅为空值。
productsDF.select('current_time','Created_datetime','time_diff').show()
+------------+-------------------+---------+
|current_time| Created_datetime|time_diff|
+------------+-------------------+---------+
| null|2019-10-12 17:09:18| null|
| null|2019-12-03 07:02:07| null|
| null|2020-01-16 23:10:08| null|
| null|2020-01-21 15:38:39| null|
| null|2020-01-21 15:14:55| null|
使用string和double类型创建新列:
|-- current_time: string (nullable = true)
|-- diff: double (nullable = true)
|-- time_diff: double (nullable = true)
我尝试创建具有字符串和文字值的列以进行测试,但是输出始终为null
。我想念什么?
答案 0 :(得分:1)
要用<p id="para">Lorem Ipsum Lorem Ipsum Lorem Ipsum Lorem IpsumLorem Ipsum Lorem IpsumLorem Ipsum Lorem IpsumLorem Ipsum Lorem IpsumLorem Ipsum Lorem IpsumLorem Ipsum Lorem IpsumLorem Ipsum Lorem Ipsum Lorem Ipsum Lorem Ipsum
</p>
填充一列,您缺少了current_datetime
函数:
lit()
要计算两个current_datetime = datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S")
productsDF = productsDF.withColumn("current_time", lit(current_datetime))
列之间的时差,可以执行以下操作:
timestamp
编辑:
对于小时,天,月和年的时差,您可以执行以下操作:
productsDF.withColumn('time_diff',(F.unix_timestamp('current_time') -
F.unix_timestamp('Created_datetime'))/3600).show()
如果您希望精确的时差,那么:
df.withColumn('time_diff_hours',(F.unix_timestamp('current_time') - F.unix_timestamp('Created_datetime'))/3600)\
.withColumn("time_diff_days", datediff(col("current_time"),col("Created_datetime")))\
.withColumn("time_diff_months", months_between(col("current_time"),col("Created_datetime")))\
.withColumn("time_diff_years", year(col("current_time")) - year(col("Created_datetime"))).show()
+-------------------+-------------------+------------------+--------------+----------------+---------------+
| Created_datetime| current_time| time_diff_hours|time_diff_days|time_diff_months|time_diff_years|
+-------------------+-------------------+------------------+--------------+----------------+---------------+
|2019-10-12 17:09:18|2020-10-15 02:45:49| 8841.60861111111| 369| 12.07743093| 1|
|2019-12-03 07:02:07|2020-10-15 02:45:49|7602.7283333333335| 317| 10.38135529| 1|
|2020-01-16 23:10:08|2020-10-15 02:45:49| 6530.594722222222| 273| 8.94031549| 0|
+-------------------+-------------------+------------------+--------------+----------------+---------------+