我希望能够使用java构建模型,我可以使用CLI进行以下操作:
./mahout trainlogistic --input Candy-Crush.twtr.csv \
--output ./model \
--target hd_click --categories 2 \
--predictors click_frequency country_code ctr device_price_range hd_conversion time_of_day num_clicks phone_type twitter is_weekend app_entertainment app_wallpaper app_widgets arcade books_and_reference brain business cards casual comics communication education entertainment finance game_wallpaper game_widgets health_and_fitness health_fitness libraries_and_demo libraries_demo lifestyle media_and_video media_video medical music_and_audio news_and_magazines news_magazines personalization photography productivity racing shopping social sports sports_apps sports_games tools transportation travel_and_local weather app_entertainment_percentage app_wallpaper_percentage app_widgets_percentage arcade_percentage books_and_reference_percentage brain_percentage business_percentage cards_percentage casual_percentage comics_percentage communication_percentage education_percentage entertainment_percentage finance_percentage game_wallpaper_percentage game_widgets_percentage health_and_fitness_percentage health_fitness_percentage libraries_and_demo_percentage libraries_demo_percentage lifestyle_percentage media_and_video_percentage media_video_percentage medical_percentage music_and_audio_percentage news_and_magazines_percentage news_magazines_percentage personalization_percentage photography_percentage productivity_percentage racing_percentage shopping_percentage social_percentage sports_apps_percentage sports_games_percentage sports_percentage tools_percentage transportation_percentage travel_and_local_percentage weather_percentage reads_magazine_sum reads_magazine_count interested_in_gardening_sum interested_in_gardening_count kids_birthday_coming_sum kids_birthday_coming_count job_seeker_sum job_seeker_count friends_sum friends_count married_sum married_count charity_donor_sum charity_donor_count student_sum student_count interested_in_real_estate_sum interested_in_real_estate_count sports_fan_sum sports_fan_count bascketball_sum bascketball_count interested_in_politics_sum interested_in_politics_count gamer_sum gamer_count activist_sum activist_count traveler_sum traveler_count likes_soccer_sum likes_soccer_count interested_in_celebs_sum interested_in_celebs_count auto_racing_sum auto_racing_count age_group_sum age_group_count healthy_lifestyle_sum healthy_lifestyle_count interested_in_finance_sum interested_in_finance_count sports_teams_usa_sum sports_teams_usa_count interested_in_deals_sum interested_in_deals_count business_oriented_sum business_oriented_count interested_in_cooking_sum interested_in_cooking_count music_lover_sum music_lover_count beauty_sum beauty_count follows_fashion_sum follows_fashion_count likes_wrestling_sum likes_wrestling_count name_sum name_count shopper_sum shopper_count golf_sum golf_count vegetarian_sum vegetarian_count dating_sum dating_count interested_in_fashion_sum interested_in_fashion_count interested_in_news_sum interested_in_news_count likes_tennis_sum likes_tennis_count male_sum male_count interested_in_cars_sum interested_in_cars_count follows_bloggers_sum follows_bloggers_count entertainment_sum entertainment_count interested_in_books_sum interested_in_books_count has_kids_sum has_kids_count interested_in_movies_sum interested_in_movies_count musicians_sum musicians_count tech_oriented_sum tech_oriented_count female_sum female_count has_pet_sum has_pet_count practicing_sports_sum practicing_sports_count \
--types numeric word numeric word word word numeric word word word numeric \
--features 100 --passes 1 --rate 50
我无法理解20个新闻组的例子,因为它很难学习。 任何人都可以给我一个与cli命令相同的代码吗?
澄清:
我需要这样的东西:
model.train(1,0,"monday",6,44,1,7,4,6,78,7,3,4,6,........,"good");
model.train(1,0,"sunday",6,44,5,7,9,2,4,6,78,7,3,4,6,........,"bad");
model.train(1,0,"monday",4,99,2,4,6,3,4,6,........,"good");
model.writeTofile("myModel.model");
如果你不是家庭成员,并且只想告诉我如何从JAVA执行CLI命令,那么PLESE不会回答
答案 0 :(得分:6)
我不是100%熟悉Mahout API(我同意文档非常稀疏)所以我只能指点,但我希望它有所帮助:
trainlogistic
示例的Java源代码实际上可以在mahout-examples
库中找到 - 它位于maven [0](org.apache.mahout.classifier.sgd.TrainLogistic
)中。我想如果你想,你可以使用完全相同的源代码,但它取决于mahout-examples
库中的几个实用程序类(并且它也不是很干净)。
本例中执行训练的类是org.apache.mahout.classifier.sgd.OnlineLogisticRegression
[1],虽然考虑到你有大量的预测变量,你可能想要使用AdaptiveLogisticRegression
[2](同一个包),在内部使用了多个OnlineLogisticRegression
s。但是你必须亲自看看哪种方式最适合你的数据。
API非常简单,有一个train
方法,可以使用Vector
输入数据和classify
方法来测试您的模型,以及learningRate
和其他人改变模型的参数。
要像在命令行工具中那样将模型保存到磁盘,请使用org.apache.mahout.classifier.sgd.ModelSerializer
,它具有直接的API来编写和读取您的模型。 (OLR类本身也有write
和readFields
个方法,但坦率地说,我不确定他们做了什么,或者对ModelSerializer
有什么不同 - 他们也没有记录。)
最后,除了mahout-examples
中的源代码之外,还有另外两个直接使用Mahout API的示例,这可能很有用[3,4]。
来源:
[0] http://repo1.maven.org/maven2/org/apache/mahout/mahout-examples/0.8/
[4] http://skife.org/mahout/2013/02/14/first_steps_with_mahout.html
答案 1 :(得分:1)
此博客有关于如何使用Mahout Java API进行培训和分类的好文章:http://nigap.blogspot.com/2012/02/bayes-algorithm-with-apache-mahout.html
答案 2 :(得分:-2)
您可以使用Runtime.exec从java执行相同的cmd行。
简单的方法是:
Process p = Runtime.getRuntime().exec("/usr/bin/bash -ic \"<path_to_mahout>/mahout trainlogistic --input Candy-Crush.twtr.csv "
+ "--output ./model "
+ "--target hd_click --categories 2 "
+ "--predictors click_frequency country_code ctr device_price_range hd_conversion time_of_day num_clicks phone_type twitter is_weekend app_entertainment app_wallpaper app_widgets arcade books_and_reference brain business cards casual comics communication education entertainment finance game_wallpaper game_widgets health_and_fitness health_fitness libraries_and_demo libraries_demo lifestyle media_and_video media_video medical music_and_audio news_and_magazines news_magazines personalization photography productivity racing shopping social sports sports_apps sports_games tools transportation travel_and_local weather app_entertainment_percentage app_wallpaper_percentage app_widgets_percentage arcade_percentage books_and_reference_percentage brain_percentage business_percentage cards_percentage casual_percentage comics_percentage communication_percentage education_percentage entertainment_percentage finance_percentage game_wallpaper_percentage game_widgets_percentage health_and_fitness_percentage health_fitness_percentage libraries_and_demo_percentage libraries_demo_percentage lifestyle_percentage media_and_video_percentage media_video_percentage medical_percentage music_and_audio_percentage news_and_magazines_percentage news_magazines_percentage personalization_percentage photography_percentage productivity_percentage racing_percentage shopping_percentage social_percentage sports_apps_percentage sports_games_percentage sports_percentage tools_percentage transportation_percentage travel_and_local_percentage weather_percentage reads_magazine_sum reads_magazine_count interested_in_gardening_sum interested_in_gardening_count kids_birthday_coming_sum kids_birthday_coming_count job_seeker_sum job_seeker_count friends_sum friends_count married_sum married_count charity_donor_sum charity_donor_count student_sum student_count interested_in_real_estate_sum interested_in_real_estate_count sports_fan_sum sports_fan_count bascketball_sum bascketball_count interested_in_politics_sum interested_in_politics_count gamer_sum gamer_count activist_sum activist_count traveler_sum traveler_count likes_soccer_sum likes_soccer_count interested_in_celebs_sum interested_in_celebs_count auto_racing_sum auto_racing_count age_group_sum age_group_count healthy_lifestyle_sum healthy_lifestyle_count interested_in_finance_sum interested_in_finance_count sports_teams_usa_sum sports_teams_usa_count interested_in_deals_sum interested_in_deals_count business_oriented_sum business_oriented_count interested_in_cooking_sum interested_in_cooking_count music_lover_sum music_lover_count beauty_sum beauty_count follows_fashion_sum follows_fashion_count likes_wrestling_sum likes_wrestling_count name_sum name_count shopper_sum shopper_count golf_sum golf_count vegetarian_sum vegetarian_count dating_sum dating_count interested_in_fashion_sum interested_in_fashion_count interested_in_news_sum interested_in_news_count likes_tennis_sum likes_tennis_count male_sum male_count interested_in_cars_sum interested_in_cars_count follows_bloggers_sum follows_bloggers_count entertainment_sum entertainment_count interested_in_books_sum interested_in_books_count has_kids_sum has_kids_count interested_in_movies_sum interested_in_movies_count musicians_sum musicians_count tech_oriented_sum tech_oriented_count female_sum female_count has_pet_sum has_pet_count practicing_sports_sum practicing_sports_count "
+ "--types numeric word numeric word word word numeric word word word numeric "
+ "--features 100 --passes 1 --rate 50\"");
如果你选择这个,那我建议先阅读: When Runtime.exec() won't
这样,应用程序将在不同的过程中运行。
此外,您可以按照“与您的应用程序集成”部分进行操作。来自以下网站: Recomender Documentation
这也是写一个推荐者的好参考: Introducing Apache Mahout
希望这会有所帮助。 干杯