Question

最近我一直试图用REDDIT从特定用户的帖子数据集上预测使用Recurrent Neural Network的用户状态，但我无法理解我需要输入什么

例如，有一个用户想购买手机，我想预测用户的状态是否是状态：1（考虑购买新手机）状态：2（买了一部新手机））状态：3（购买后后悔）基于他以前的reddit帖子。据我了解，我认为它涉及时间序列分析。我不仅能够理解在训练RNN时需要输入什么样的数据结构以及在这种情况下如何实际实现RNN。如果有更好的方法或其他一些时间序列分类器，请加以说明。

到目前为止，我能够使用keras跟进一个教程（http://machinelearningmastery.com/sequence-classification-lstm-recurrent-neural-networks-python-keras/），但它教我如何通过对评论进行分类（正面/负面）来实现。此外，由于每个帖子彼此独立，但在我的情况下，每次购买都取决于其他购买。

我有JSON格式的数据，我手动定义了状态（键），我将其用作类标签，如下所示：

国家定义为： 状态：1（考虑购买新手机）州：2（买了一部新手机）州：3（购买后后悔）

{
    "_id" : ObjectId("596ffc2423900a52fb2fb077"),
    "redditor" : "boughtSamsungGalaxy8",
    "posts" : [ 
        {
            "state" : 1,
            "created_at" : 146532030908.0,
            "title" : "Did I do the right thing",
            "num_comments" : 7,
            "downs" : 0,
            "manual_annotation_done" : 1,
            "subreddit" : "mobilePhonePurchase",
            "score" : 7,
            "post_type" : 5,
            "post" : "Guys I am planning to buy a new phone, which one should I buy? Please give me valuable suggestions",
            "ups" : 7,
            "id" : "598Qc"
        }
    ],
    "manual_annotation_done" : 1
}

{
    "_id" : ObjectId("596ffc2423900a52fb2fb077"),
    "redditor" : "boughtSamsungGalaxy8",
    "posts" : [ 
        {
            "state" : 2,
            "created_at" : 146532030908.0,
            "title" : "I bought the phone will review it next week",
            "num_comments" : 7,
            "downs" : 0,
            "manual_annotation_done" : 1,
            "subreddit" : "mobilePhonePurchase",
            "score" : 7,
            "post_type" : 5,
            "post" : "Hello everyone I finalised on xyz phone and I am thinking to review it after a week or so. Let me get my hands dirty on them!",
            "ups" : 7,
            "id" : "598Qc"
        }
    ],
    "manual_annotation_done" : 1
}


{
    "_id" : ObjectId("596ffc2423900a52fb2fb077"),
    "redditor" : "boughtSamsungGalaxy8",
    "posts" : [ 
        {
            "state" : 3,
            "created_at" : 146532030908.0,
            "title" : "Did I do the right thing",
            "num_comments" : 7,
            "downs" : 0,
            "manual_annotation_done" : 1,
            "subreddit" : "mobilePhonePurchase",
            "score" : 7,
            "post_type" : 5,
            "post" : "As I write I am messed up the mobile phone heats, its a bomb in the hands",
            "ups" : 7,
            "id" : "598Qc"
        }
    ],
    "manual_annotation_done" : 1
}

现在我需要根据手动注释的状态编号来预测状态。

我很困惑如何去做。由于我对机器学习相当新，任何见解都会对我有所帮助。如果您知道有任何教程可以做到这一点吗？

此模型的输入将是：发布和评论（一揽子单词）。标签将是注释状态。

由于

使用RNN预测机器学习中的状态

0 个答案: