我有以下问题,想知道尝试强化学习是否有帮助:
I have a set of objects and these objects are supposed to belong to two classes. The way to choose to which class each object belongs is by maximizing a certain reward function (known) that depends on certain input features.
我试图将其建模为多标签问题,但这无济于事,因为我在乎最大化奖励功能。我对强化学习的知识是有限的,这就是为什么我想知道在开始使用这种方法进行深入研究之前是否可以对此进行建模。