Question

该列有如下记录。

timestamp

现在我想将2017-03-12 03:19:51.0 2017-03-12 03:29:51.0添加到1 hour列中的记录。

结果：

test_time

如何实现我的结果。

我在下面讨论

2017-03-12 04:19:51.0
2017-03-12 04:29:51.0

得到以下错误

df['test_time'] = df['test_time'].apply(lambda x: x - pd.DateOffset(hours=1))

Answer 1

将其转换为UTC时间戳后，应该非常容易。这是一种方法：

from pyspark.sql.functions import to_utc_timestamp,from_utc_timestamp
from datetime import timedelta

## Create a dummy dataframe
df = sqlContext.createDataFrame([('1997-02-28 10:30:00',)], ['t'])

## Add column to convert time to utc timestamp in PST
df2 = df.withColumn('utc_timestamp',to_utc_timestamp(df.t,"PST"))

## Add one hour with the timedelta function
df3 = df2.map(lambda x: (x.t,x.utc_timestamp+timedelta(hours=1))).toDF(['t','new_utc_timestamp'])

## Convert back to original time zone and format
df4 = df3.withColumn('new_t',from_utc_timestamp(df3.new_utc_timestamp,"PST"))

＆＃34; new_t＆＃34; df4中的列是根据您的系统将所需的列转换回适当的时区。

Answer 2

在pyspark中执行此操作的正确方法是：

from pyspark.sql.functions expr
df = df.withColumn("test_time_plus_hour", df['test_time'] + expr('INTERVAL 1 HOURS'))

在Pyspark数据框中添加1小时的时间戳列

2 个答案: