使用RNN预测机器学习中的状态

时间:2017-08-07 06:20:14

标签: python machine-learning time-series keras recurrent-neural-network

最近我一直试图用REDDIT从特定用户的帖子数据集上预测使用Recurrent Neural Network的用户状态,但我无法理解我需要输入什么

例如,有一个用户想购买手机,我想预测用户的状态是否是状态:1(考虑购买新手机)状态:2(买了一部新手机) )状态:3(购买后后悔)基于他以前的reddit帖子。据我了解,我认为它涉及时间序列分析。我不仅能够理解在训练RNN时需要输入什么样的数据结构以及在这种情况下如何实际实现RNN。如果有更好的方法或其他一些时间序列分类器,请加以说明。

到目前为止,我能够使用keras跟进一个教程(http://machinelearningmastery.com/sequence-classification-lstm-recurrent-neural-networks-python-keras/),但它教我如何通过对评论进行分类(正面/负面)来实现。此外,由于每个帖子彼此独立,但在我的情况下,每次购买都取决于其他购买。

我有JSON格式的数据,我手动定义了状态(键),我将其用作类标签,如下所示:

国家定义为: 状态:1(考虑购买新手机) 州:2(买了一部新手机) 州:3(购买后后悔)

{
    "_id" : ObjectId("596ffc2423900a52fb2fb077"),
    "redditor" : "boughtSamsungGalaxy8",
    "posts" : [ 
        {
            "state" : 1,
            "created_at" : 146532030908.0,
            "title" : "Did I do the right thing",
            "num_comments" : 7,
            "downs" : 0,
            "manual_annotation_done" : 1,
            "subreddit" : "mobilePhonePurchase",
            "score" : 7,
            "post_type" : 5,
            "post" : "Guys I am planning to buy a new phone, which one should I buy? Please give me valuable suggestions",
            "ups" : 7,
            "id" : "598Qc"
        }
    ],
    "manual_annotation_done" : 1
}

{
    "_id" : ObjectId("596ffc2423900a52fb2fb077"),
    "redditor" : "boughtSamsungGalaxy8",
    "posts" : [ 
        {
            "state" : 2,
            "created_at" : 146532030908.0,
            "title" : "I bought the phone will review it next week",
            "num_comments" : 7,
            "downs" : 0,
            "manual_annotation_done" : 1,
            "subreddit" : "mobilePhonePurchase",
            "score" : 7,
            "post_type" : 5,
            "post" : "Hello everyone I finalised on xyz phone and I am thinking to review it after a week or so. Let me get my hands dirty on them!",
            "ups" : 7,
            "id" : "598Qc"
        }
    ],
    "manual_annotation_done" : 1
}


{
    "_id" : ObjectId("596ffc2423900a52fb2fb077"),
    "redditor" : "boughtSamsungGalaxy8",
    "posts" : [ 
        {
            "state" : 3,
            "created_at" : 146532030908.0,
            "title" : "Did I do the right thing",
            "num_comments" : 7,
            "downs" : 0,
            "manual_annotation_done" : 1,
            "subreddit" : "mobilePhonePurchase",
            "score" : 7,
            "post_type" : 5,
            "post" : "As I write I am messed up the mobile phone heats, its a bomb in the hands",
            "ups" : 7,
            "id" : "598Qc"
        }
    ],
    "manual_annotation_done" : 1
}

现在我需要根据手动注释的状态编号来预测状态。

我很困惑如何去做。由于我对机器学习相当新,任何见解都会对我有所帮助。如果您知道有任何教程可以做到这一点吗?

此模型的输入将是:发布和评论(一揽子单词)。标签将是注释状态。

由于

0 个答案:

没有答案