Question

目标：预测用户D是否会进行购买。

问题：如何正确构造此数据以训练并进行基于用户的预测？

以下是示例数据：

+--------+----------------------+----------+-----------+
| UserID |         Text         | Features | Purchased |
+--------+----------------------+----------+-----------+
| A      | Yes I agree…         | …        | 1         |
| A      | No one ever wants….  | …        | 1         |
| A      | Have you consider…   | …        | 1         |
| B      | Patriots aren't the… | …        | 0         |
| C      | How many times…      | …        | 1         |
| C      | Last year, I reme…   | …        | 1         |
| D      | Some Text            | …        | -         |
| D      | Some Text            | …        | -         |
| D      | Some Text            | …        | -         |
| D      | Some Text            | …        | -         |
| …      | …                    | …        | …         |
+--------+----------------------+----------+-----------+

我考虑过为每个用户平均一些特征向量。因此，我将使用用户A的文本来获取一些文本功能（字数统计，标记化等），并取其所有帖子的平均值。但是好像我会那样丢失很多信息。

在scikit-learn中为基于用户的预测构建数据

0 个答案: