我想按工作月份滞后时间序列数据。例如,如果日历日为2018年7月12日,则1个月的滞后时间应为2018年10月26日,2、3,月等...
spark-submit \
--class com.partition.source.YearPartition \
--master=yarn \
--conf spark.ui.port=4090 \
--driver-class-path /home/devusr/jars/greenplum-spark_2.11-1.3.0.jar \
--jars /home/devusr/jars/greenplum-spark_2.11-1.3.0.jar \
--executor-cores 3 \
--executor-memory 13G \
--keytab /home/devusr/devusr.keytab \
--principal devusr@DEV.COM \
--files /usr/hdp/current/spark2-client/conf/hive-site.xml,testconnection.properties \
--name Splinter \
--conf spark.executor.extraClassPath=/home/devusr/jars/greenplum-spark_2.11-1.3.0.jar \
--conf spark.executor.instances=10 \
--conf spark.dynamicAllocation.enabled=false \
--conf spark.files.maxPartitionBytes=256M \
splinter_2.11-0.1.jar
预期结果
import pandas as pd
df = pd.date_range('2018-12-07','2018-12-10',freq = 'D')
df.shift(-1 , freq = 'BM')
答案 0 :(得分:1)
pandas时间序列模块具有一个有用的工作日功能,称为BDay
,可以为您提供帮助。
from pandas.tseries.offsets import BDay
example_day = pd.datetime(2018, 12, 7) # 12/7/2018
shifted_day = example_day - BDay(31) # Shifted by 31 business days
shifted_day # This is now a pandas TimeStamp object
# Timestamp('2018-10-25 00:00:00')
# You can also apply this shift to a Pandas Date Range
df = pd.date_range('2018-12-07','2018-12-10',freq = 'D')
result = df - BDay(31) # Keep in mind that 12/8 -> 12/10 will share the same shift day due to the weekend between them