Mahout:生成的向量训练抛出异常

时间:2017-07-25 10:55:23

标签: java hadoop2 mahout amazon-emr mahout-recommender

我正在使用Hadoop 2.7.3(使用AWS)运行Mahout 0.13.0作业。当我试图训练生成的向量时,它会抛出异常:

Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143

17/07/25 10:21:39 INFO Job: Task Id : attempt_1500972617227_0091_m_000008_2, Status : FAILED
Error: java.lang.IllegalArgumentException: Wrong numLabels: 0. Must be > 0!
    at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
    at org.apache.mahout.classifier.naivebayes.training.WeightsMapper.setup(WeightsMapper.java:44)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:796)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) 

我的学习输入数据格式如下,并且不包含任何空行:

event   308398661275111424  Book your hotel deals Hotel Mercure Rainbow Spend a couple nights at Hotel Mercure Rainbow in  #deals 
health  308215054011194849  Get $8 Off $150 @ Muscle & Strength #coupons #deals 
art 309512129285853184   Marvel Superhero Assorted Graphic Novel 15-Pack for $38 + $4 s&h: Graveyard Mall offers this ...  #Offer360 #Deals
apparel 308215054011197980  Febreze Lavender Vanilla and Comfort Fabric Refresher, 27.0-Ounce (Pack of 9): Febreze Fabric Ref...  #deal #deals
tech    309513762744979456   DataMan Next : Track Data Usage In Real-Time for iPhone on Sale ($1.99 -> $0.99)  #iphone #deal
home    308215054011203842  #discounts #deals Offer 10: Nespresso Pixie Espresso Maker, Red coffee espresso reviews  Best Buy Price
tech    308215054011206111  Lenovo DEALS - $9 Lenovo P830 Headset HOT #Lenovo #deals #coupons 
health  308381655717003265  Therapy Systems Retinol Cellular Treatment Cleanser / PM: Containing pharmaceutical grade microen... #deal #deals
tech    308215054011204704  Tena Serenity Absorbency Pads, Slender 30 ea #amazon #deals
camera  308328440174624768  #Canon Powershot A2200 14.1 MP #Digital #Camera with 4x Optical Zoom    #photo #deals

运行作业的命令:

1. mahout seqdirectory -i /opt/function/input/functionData.csv -o /opt/function/output/
2. mahout seq2sparse -i /opt/function/output/ -o /opt/function/vector/
3. mahout split -i /opt/function/vector/tfidf-vectors --trainingOutput /opt/function/train-vectors --testOutput /opt/function/test-vectors --randomSelectionPct 40 --overwrite --sequenceFiles -xm sequential
4. mahout trainnb -i /opt/function/vector/tfidf-vectors -o /opt/function/model -li /opt/function/labelindex -ow -c

0 个答案:

没有答案