Question

我想按工作月份滞后时间序列数据。例如，如果日历日为2018年7月12日，则1个月的滞后时间应为2018年10月26日，2、3，月等...

spark-submit \
        --class com.partition.source.YearPartition                                              \
        --master=yarn                                                                           \
        --conf spark.ui.port=4090                                                               \
        --driver-class-path /home/devusr/jars/greenplum-spark_2.11-1.3.0.jar                    \
        --jars /home/devusr/jars/greenplum-spark_2.11-1.3.0.jar                                 \
        --executor-cores 3                                                                      \
        --executor-memory 13G                                                                   \
        --keytab /home/devusr/devusr.keytab                                                     \
        --principal devusr@DEV.COM                                                              \
        --files /usr/hdp/current/spark2-client/conf/hive-site.xml,testconnection.properties     \
        --name Splinter                                                                         \
        --conf spark.executor.extraClassPath=/home/devusr/jars/greenplum-spark_2.11-1.3.0.jar   \
        --conf spark.executor.instances=10                                                      \
        --conf spark.dynamicAllocation.enabled=false                                            \
        --conf spark.files.maxPartitionBytes=256M                                               \
        splinter_2.11-0.1.jar

预期结果

import pandas as pd
df = pd.date_range('2018-12-07','2018-12-10',freq = 'D')
df.shift(-1 , freq = 'BM')

Answer 1

pandas时间序列模块具有一个有用的工作日功能，称为BDay，可以为您提供帮助。

from pandas.tseries.offsets import BDay
example_day = pd.datetime(2018, 12, 7) # 12/7/2018
shifted_day = example_day - BDay(31) # Shifted by 31 business days

shifted_day # This is now a pandas TimeStamp object
# Timestamp('2018-10-25 00:00:00')

# You can also apply this shift to a Pandas Date Range
df = pd.date_range('2018-12-07','2018-12-10',freq = 'D')
result = df - BDay(31) # Keep in mind that 12/8 -> 12/10 will share the same shift day due to the weekend between them

如何使大熊猫的日历日滞后于业务月份

1 个答案: