如何获得滚动月份数据

时间:2019-05-14 14:07:48

标签: pyspark

Pyspark代码将n-3个滚动窗口的数据写入Hive表

我写了一个pyspark,它将获取过去3个月的数据并插入到另一个Hive表中。该代码工作正常。我已经使用了Year&month列,而这两个是分区列,因此我没有任何日期列。因此,在这样的条件下,我已取年=当前月份3和当前月份之间的当前年份和月份。但是,如果我写的是1和2月份,那么如何定义年份?

from pyspark.sql.functions import *
from pyspark.sql import SparkSession
from pyspark.sql import Row
from pyspark.sql.types import *
with open('/path/.sql') as f:
             query = f.read()
df = spark.sql(query)
df.registerTempTable("temptable")
spark.sql("set hive.exec.dynamic.partition=true")
spark.sql("set hive.exec.dynamic.partition.mode=nonstrict")
spark.sql("Insert overwrite table tablename (year,month) select * from temptable where year == year(from_unixtime(unix_timestamp())) and month between month(from_unixtime(unix_timestamp()))-2 and month(from_unixtime(unix_timestamp()))")
Print "The table is update"

0 个答案:

没有答案