Question

我希望根据1）以前的历史，2）一些子类信息和3）任意月份信息来预测给定客户的销售额。月份可能对某些客户有影响，但对其他客户无影响。信息看起来大致如此：

private static Ftr createFooter(WordprocessingMLPackage wordMLPackage, String content, ObjectFactory factory, Part sourcePart, InputStream is) throws IOException, Throwable {
        Ftr footer = factory.createFtr();
        P paragraph = factory.createP();
        R run = factory.createR();
        /*
         * Change the font size to 8 points(the font size is defined to be in half-point
         * size so set the value as 16).
         */
        RPr rpr = new RPr();
        HpsMeasure size = new HpsMeasure();
        size.setVal(BigInteger.valueOf(16));
        rpr.setSz(size);
        run.setRPr(rpr);
        Text text = new Text();
        text.setValue(content);
        run.getContent().add(text);
        paragraph.getContent().add(run);
        footer.getContent().add(paragraph);

        // add page number
        P pageNumParagraph = factory.createP();
        addFieldBegin(factory, pageNumParagraph);
        addPageNumberField(factory, pageNumParagraph);
        addFieldEnd(factory, pageNumParagraph);
        footer.getContent().add(pageNumParagraph);
        return footer;
    }

private static void addPageNumberField(ObjectFactory factory, P paragraph) {
        R run = factory.createR();
        PPr ppr = new PPr();
        Jc jc = new Jc();
        jc.setVal(JcEnumeration.RIGHT);
        ppr.setJc(jc);
        paragraph.setPPr(ppr);
        Text txt = new Text();
        txt.setSpace("preserve");
        txt.setValue(" PAGE   \\* MERGEFORMAT ");
        run.getContent().add(factory.createRInstrText(txt));
        paragraph.getContent().add(run);

    }

所以，虽然我可以使用某种类型的编码，但我不确定哪些重要，哪些不重要。它“感觉”是一种围绕值的无监督聚类，但不确定最好的方法是什么。

Answer 1

这绝对是一种有监督的学习（回归问题，因为你的目标变量是连续的）。此外，当您碰巧拥有客户的先前历史记录时，您确实面临时间序列预测问题。

有很多（很多......）不同的方法可以解决这个问题，但一个简单（而且非常有效）的方法是使用autoregression解决问题：

选择window_size（要回顾的月数 - 例如5）
对于每对（account_id，sub_account）和一组window_size个连续月份，生成： sales_5m_ago, sales_4m_ago, ... ,sales_1m_ago ==> sales this month 作为训练实例。

通过这种方式，您将得到一个包含许多销售历史子集的数据集，可用于预测下个月会发生的情况。然后，您可以使用它构建任何回归模型（例如，RandomForestRegressor）并预测测试客户的销售额，您只需提供之前的window_size销售额，模型将为您提供下一个的预测一个月。

最后，如果您希望/拥有原始数据中的其他功能，则可以在创建训练集时将它们与每月销售值组合在一起：

sales_5m_ago, sales_4m_ago, ... ==＆gt; sales_5m_ago,temperature_5m_ago, rain_days_5m_ago, sales_4m_ago,temperature_4m_ago, rain_days_4m_ago, ...

未知类别预测价值的模型/损失函数是什么？

1 个答案: