Question

我有一个用户详细信息数据集，我想在其中为每个用户生成一个分数。

所需的输出范围看起来像低，中和高。我正在进行逻辑回归。

解决这类问题的方法正确吗？

有什么建议吗？

Answer 1

要回答您的问题：它是不错的-在大多数情况下建议-以模型开头。

更重要的问题是，我认为您应该在这里问的是，您拥有什么样的用户数据，以及如何根据选择的模型执行该操作：

  - data has a large number of features: you probably want to run a PCA, XGBOOST or another feature importance evaluation to separate useful features from noise features
  - you have a large amount of text data, i.e. logs: you might want to attach a naive Bayes, tf/idf or another model that performs well with text-based data
  - does your data tend to overfit when using model X? Maybe you want to do data engineering or try a different model

我对您的建议是首先构建LR模型，并查看其在训练/测试/预测数据集上的表现，并评估性能是否满足您的需求，然后再考虑/讨论不同的模型/方法。

后勤回归是创建评分模型的更好方法吗？

1 个答案: