使用pyspark中的加权移动平均线进行预测

时间:2018-06-11 09:41:31

标签: python apache-spark pyspark

我可以用Pyspark中的加权移动平均线进行预测吗?

假设我有一个类似的数据框:

+--------+---------------+------------+--------+-----------+---------------+-----------+
|     MKT|   MANUFACTURER|   FRANCHISE|   BRAND|   SUBBRAND|          Sales|date_format|
+--------+---------------+------------+--------+-----------+---------------+-----------+
|Market A|   Competitor A|       FR AA|    BR K|     SBR AR|            0.0| 2015-09-19|
|Market A|   Competitor A|       FR AA|    BR K|     SBR AS|            0.0| 2016-06-25|
|Market A|   Competitor A|       FR AA|    BR K|     SBR AT|            0.0| 2015-10-24|
|Market A|   Competitor A|       FR AA|    BR K|     SBR AT|            0.0| 2015-11-28|
|Market A|   Competitor A|        FR Y|    BR H|     SBR AD|            0.0| 2015-02-21|
|Market A|   Competitor A|        FR Y|    BR H|     SBR AE|            0.0| 2016-08-13|
|Market A|   Competitor A|        FR Y|    BR H|     SBR AG|            0.0| 2015-02-07|
|Market A|   Competitor A|        FR Y|    BR H|      SBR Y|            0.0| 2016-06-04|
|Market A|   Competitor A|        FR Y|    BR I|     SBR AI|            0.0| 2016-07-16|
|Market A|   Competitor A|        FR Y|    BR I|     SBR AP|            0.0| 2016-04-30|
|Market A|   Competitor B|       FR AB|    BR L|     SBR AU|        20677.0| 2016-03-05|
|Market A|   Competitor B|       FR AB|    BR N|     SBR AY|            0.0| 2015-11-07|
|Market A|   Competitor E|       FR AF|    BR R|     SBR BC|         5834.0| 2016-07-09|
|Market A|   Competitor E|       FR AF|    BR S|     SBR BD|         1664.0| 2015-06-20|
|Market A|     My Company|        FR W|    BR D|      SBR H|            0.0| 2016-02-27|
|Market A|     My Company|        FR W|    BR E|      SBR K|        10355.0| 2015-09-19|
|Market A|     My Company|        FR W|    BR E|      SBR O|            0.0| 2015-12-26|
|Market A|     My Company|        FR W|    BR E|      SBR S|            0.0| 2016-01-23|
|Market A|     My Company|        FR W|    BR E|      SBR T|            0.0| 2015-09-19|
|Market A|     My Company|        FR X|    BR G|      SBR V|            0.0| 2015-02-07|
+--------+---------------+------------+--------+-----------+---------------+-----------+

如果我的上一个日期是2016-08-13并且我想预测接下来的四个数据点,我应该怎么做呢? 我无法使用新日期添加更多行。

我想预测接下来的4个点,所以目前我在另一个列中采用Sales的滚动平均值,然后基于新列进行外推,这是一个错误的方法。

我的加权移动平均函数(source):

def weighted_average(c, window, offsets, weights):
    assert len(weights) == len(offsets)

    def value(i):
        if i < 0:
            return F.lag(c, -i).over(window)
        if i > 0:
            return F.lead(c, i).over(window)
        return c

    # Create a list of Columns
    # - `value_i * weight_i` if `value_i IS NOT NULL`
    # - literal 0 otherwise
    values = [F.coalesce(value(i) * w, F.lit(0)) for i, w in zip(offsets, weights)]

    # or sum(values, lit(0))
    return reduce(add, values, F.lit(0))

我使用它的方式:

og_df = original_data.withColumn('week_1_pred', weighted_average(F.col('Sales'), w, offsets, delays))
og_df = og_df.withColumn('week_2_pred', weighted_average(F.col('week_1_pred'), w, offsets, delays))
og_df = og_df.withColumn('week_3_pred', weighted_average(F.col('week_2_pred'), w, offsets, delays))
og_df = og_df.withColumn('week_4_pred', weighted_average(F.col('week_3_pred'), w, offsets, delays))

结果数据框是:

+--------+---------------+------------+--------+-----------+---------------+-----------+------------------+------------------+------------------+------------------+
|     MKT|   MANUFACTURER|   FRANCHISE|   BRAND|   SUBBRAND|          Sales|date_format|       week_1_pred|       week_2_pred|       week_3_pred|       week_4_pred|
+--------+---------------+------------+--------+-----------+---------------+-----------+------------------+------------------+------------------+------------------+
|Market C|   Competitor E|       FR AF|    BR R|     SBR BC|        28431.0| 2015-01-10|               0.0|               0.0|               0.0|               0.0|
|Market C|   Competitor E|       FR AF|    BR R|     SBR BC|        28988.0| 2015-01-17|            8529.3|               0.0|               0.0|               0.0|
|Market C|   Competitor E|       FR AF|    BR R|     SBR BC|        34777.0| 2015-01-24|17225.699999999997|2558.7899999999995|               0.0|               0.0|
|Market C|   Competitor E|       FR AF|    BR R|     SBR BC|        36580.0| 2015-01-31|           24815.7| 7726.499999999998| 767.6369999999998|               0.0|
|Market C|   Competitor E|       FR AF|    BR R|     SBR BC|        42142.0| 2015-02-07|30047.799999999996|14318.279999999999| 3085.586999999999|230.29109999999994|
|Market C|   Competitor E|       FR AF|    BR R|     SBR BC|        44354.0| 2015-02-14|34892.350000000006|          20757.12| 7125.191999999999|1155.9671999999996|
|Market C|   Competitor E|       FR AF|    BR R|     SBR BC|        46517.0| 2015-02-21|39613.450000000004|26594.219999999998|12323.798999999999|3216.7610999999993|
|Market C|   Competitor E|       FR AF|    BR R|     SBR BC|        45123.0| 2015-02-28|          42535.95|32130.620000000003|        17969.6475|         6528.5784|
|Market C|   Competitor E|       FR AF|    BR R|     SBR BC|        45031.0| 2015-03-07|          44144.85|          36730.14|        23714.9685|10860.012899999998|
|Market C|   Competitor E|       FR AF|    BR R|     SBR BC|        42457.0| 2015-03-14|           44721.1|40159.340000000004|         29155.023|         15875.325|
|Market C|   Competitor E|       FR AF|    BR R|     SBR BC|        43195.0| 2015-03-21| 44247.49999999999|        42375.3275|33906.159999999996|21197.845800000003|
|Market C|   Competitor E|       FR AF|    BR R|     SBR BC|        41085.0| 2015-03-28|          43757.65|         43498.435|       37687.05725|26430.762899999998|
|Market C|   Competitor E|       FR AF|    BR R|     SBR BC|        35191.0| 2015-04-04|           42860.5|          43867.72| 40403.25275000001|31195.138949999997|
|Market C|   Competitor E|       FR AF|    BR R|     SBR BC|        37804.0| 2015-04-11|40275.200000000004|         43641.095|         42143.884|        35208.0581|
|Market C|   Competitor E|       FR AF|    BR R|     SBR BC|        34953.0| 2015-04-18|           38809.4|        42560.2875| 43034.33824999999| 38335.66804999999|
|Market C|   Competitor E|       FR AF|    BR R|     SBR BC|        32382.0| 2015-04-25|           37256.4|         41121.675|      43110.535625| 40555.88209999999|
|Market C|   Competitor E|       FR AF|    BR R|     SBR BC|        34295.0| 2015-05-02|           35494.4|        39561.0875|      42513.267875| 41892.22509999999|
|Market C|   Competitor E|       FR AF|    BR R|     SBR BC|        31506.0| 2015-05-09|34587.899999999994|        37945.5475|        41449.3035|42412.912599999996|
|Market C|   Competitor E|       FR AF|    BR R|     SBR BC|        29105.0| 2015-05-16|          33361.75|         36513.695|         40107.795|        42241.6692|
|Market C|   Competitor E|       FR AF|    BR R|     SBR BC|        29092.0| 2015-05-23|31918.350000000002|         35163.645| 38672.22687500001|        41539.7478|
+--------+---------------+------------+--------+-----------+---------------+-----------+------------------+------------------+------------------+------------------+
only showing top 20 rows

请帮帮我。

0 个答案:

没有答案