最近我一直试图用REDDIT从特定用户的帖子数据集上预测使用Recurrent Neural Network的用户状态,但我无法理解我需要输入什么
例如,有一个用户想购买手机,我想预测用户的状态是否是状态:1(考虑购买新手机)状态:2(买了一部新手机) )状态:3(购买后后悔)基于他以前的reddit帖子。据我了解,我认为它涉及时间序列分析。我不仅能够理解在训练RNN时需要输入什么样的数据结构以及在这种情况下如何实际实现RNN。如果有更好的方法或其他一些时间序列分类器,请加以说明。
到目前为止,我能够使用keras跟进一个教程(http://machinelearningmastery.com/sequence-classification-lstm-recurrent-neural-networks-python-keras/),但它教我如何通过对评论进行分类(正面/负面)来实现。此外,由于每个帖子彼此独立,但在我的情况下,每次购买都取决于其他购买。
我有JSON格式的数据,我手动定义了状态(键),我将其用作类标签,如下所示:
国家定义为: 状态:1(考虑购买新手机) 州:2(买了一部新手机) 州:3(购买后后悔)
{
"_id" : ObjectId("596ffc2423900a52fb2fb077"),
"redditor" : "boughtSamsungGalaxy8",
"posts" : [
{
"state" : 1,
"created_at" : 146532030908.0,
"title" : "Did I do the right thing",
"num_comments" : 7,
"downs" : 0,
"manual_annotation_done" : 1,
"subreddit" : "mobilePhonePurchase",
"score" : 7,
"post_type" : 5,
"post" : "Guys I am planning to buy a new phone, which one should I buy? Please give me valuable suggestions",
"ups" : 7,
"id" : "598Qc"
}
],
"manual_annotation_done" : 1
}
{
"_id" : ObjectId("596ffc2423900a52fb2fb077"),
"redditor" : "boughtSamsungGalaxy8",
"posts" : [
{
"state" : 2,
"created_at" : 146532030908.0,
"title" : "I bought the phone will review it next week",
"num_comments" : 7,
"downs" : 0,
"manual_annotation_done" : 1,
"subreddit" : "mobilePhonePurchase",
"score" : 7,
"post_type" : 5,
"post" : "Hello everyone I finalised on xyz phone and I am thinking to review it after a week or so. Let me get my hands dirty on them!",
"ups" : 7,
"id" : "598Qc"
}
],
"manual_annotation_done" : 1
}
{
"_id" : ObjectId("596ffc2423900a52fb2fb077"),
"redditor" : "boughtSamsungGalaxy8",
"posts" : [
{
"state" : 3,
"created_at" : 146532030908.0,
"title" : "Did I do the right thing",
"num_comments" : 7,
"downs" : 0,
"manual_annotation_done" : 1,
"subreddit" : "mobilePhonePurchase",
"score" : 7,
"post_type" : 5,
"post" : "As I write I am messed up the mobile phone heats, its a bomb in the hands",
"ups" : 7,
"id" : "598Qc"
}
],
"manual_annotation_done" : 1
}
现在我需要根据手动注释的状态编号来预测状态。
我很困惑如何去做。由于我对机器学习相当新,任何见解都会对我有所帮助。如果您知道有任何教程可以做到这一点吗?
此模型的输入将是:发布和评论(一揽子单词)。标签将是注释状态。
由于