使用r来设置令牌集

时间:2018-01-13 07:27:50

标签: r stemming snowball

我尝试过使用snowballc stemmer进行词干分析,但它为同样的查询产生不同的输出

wordStem("waiting",language = "porter")
## [1] wait

上面的单词是正确的,但每当我提供一组令牌作为输入时

c("htc", "makes", "bad", "cheap", "phones", "dont", "buy", "cheap", 
"phones", "battery", "jock", "taiwanese", "buying", "htc", "desire", 
"phone", "htc", "mobile", "specifications", "battery", "pick", 
"low", "light", "camera", "experience", "htc", "e8", "desire", 
"10", "pro", "phone", "performance", "excellent", "camera", "nice", 
"model", "cam", "battery", "realy", "hand", "set", "phone", "coming", 
"phone", "price", "range", "average", "performer", "worst", "battery", 
"features", "goooood", "htc", "real", "hero", "amazing", "x9", 
"features", "battery", "poor", "camera", "battery", "life", "x9", 
"e9", "worry", "phone", "processor", "happy", "battery", "life", 
"drain", "faster", "buy", "product", "heating", "issue", "concern", 
"front", "facing", "camera", "awful", "htc", "phones", "heats", 
"quickly", "pity", "phone", "beautiful", "potential", "stylish", 
"htc", "fan", "wise", "ofcourse", "htc", "overpriced", "compared", 
"xiaomi", "redmi", "note", "3", "design", "fingerprint", "reader", 
"capacitive", "buttons", "screen", "iam", "100", "satisfied", 
"phone", "brought", "2014", "smoothly", "touch", "battery", "backup", 
"buying", "phone", "total", "waste", "money", "nice", "phone", 
"price", "range", "device", "front", "facing", "camera", "awful", 
"htc", "phones", "heats", "quickly", "pity", "phone", "beautiful", 
"potential", "stylish", "htc", "fan", "wise", "ofcourse", "htc", 
"overpriced", "compared", "xiaomi", "redmi", "note", "3", "design", 
"fingerprint", "reader", "capacitive", "buttons", "screen", "iam", 
"100", "satisfied", "phone", "brought", "2014", "htc", "desire", 
"eye", "happy", "phone", "phone", "appearance", "nice", "mobile", 
"plz", "update", "price", "phone", "meant", "performance", "camera", 
"htc", "nice", "mobiles", "waiting", "mobile", "htc", "10", "pro", 
"nice", "camera", "beautiful", "design", "fingerprint", "sensor", 
"overheating", "issue", "typical", "htc", "crappy", "specs", "poo")

输出无变化的单词

wordStem(htcdtokensstop,language = "porter")
##   [1] "htc"            "makes"          "bad"            "cheap"          "phones"         "dont"           "buy"            "cheap"         
##   [9] "phones"         "battery"        "jock"           "taiwanese"      "buying"         "htc"            "desire"         "phone"         
##  [17] "htc"            "mobile"         "specifications" "battery"        "pick"           "low"            "light"          "camera"        
##  [25] "experience"     "htc"            "e8"             "desire"         "10"             "pro"            "phone"          "performance"   
##  [33] "excellent"      "camera"         "nice"           "model"          "cam"            "battery"        "realy"          "hand"          
##  [41] "set"            "phone"          "coming"         "phone"          "price"          "range"          "average"        "performer"     
##  [49] "worst"          "battery"        "features"       "goooood"        "htc"            "real"           "hero"           "amazing"       
##  [57] "x9"             "features"       "battery"        "poor"           "camera"         "battery"        "life"           "x9"            
##  [65] "e9"             "worry"          "phone"          "processor"      "happy"          "battery"        "life"           "drain"         
##  [73] "faster"         "buy"            "product"        "heating"        "issue"          "concern"        "front"          "facing"        
##  [81] "camera"         "awful"          "htc"            "phones"         "heats"          "quickly"        "pity"           "phone"         
##  [89] "beautiful"      "potential"      "stylish"        "htc"            "fan"            "wise"           "ofcourse"       "htc"           
##  [97] "overpriced"     "compared"       "xiaomi"         "redmi"          "note"           "3"              "design"         "fingerprint"   
## [105] "reader"         "capacitive"     "buttons"        "screen"         "iam"            "100"            "satisfied"      "phone"         
## [113] "brought"        "2014"           "smoothly"       "touch"          "battery"        "backup"         "buying"         "phone"         
## [121] "total"          "waste"          "money"          "nice"           "phone"          "price"          "range"          "device"        
## [129] "front"          "facing"         "camera"         "awful"          "htc"            "phones"         "heats"          "quickly"       
## [137] "pity"           "phone"          "beautiful"      "potential"      "stylish"        "htc"            "fan"            "wise"          
## [145] "ofcourse"       "htc"            "overpriced"     "compared"       "xiaomi"         "redmi"          "note"           "3"             
## [153] "design"         "fingerprint"    "reader"         "capacitive"     "buttons"        "screen"         "iam"            "100"           
## [161] "satisfied"      "phone"          "brought"        "2014"           "htc"            "desire"         "eye"            "happy"         
## [169] "phone"          "phone"          "appearance"     "nice"           "mobile"         "plz"            "update"         "price"         
## [177] "phone"          "meant"          "performance"    "camera"         "htc"            "nice"           "mobiles"        "waiting"       
## [185] "mobile"         "htc"            "10"             "pro"            "nice"           "camera"         "beautiful"      "design"        
## [193] "fingerprint"    "sensor"         "overheating"    "issue"          "typical"        "htc"            "crappy"         "specs"         
## [201] "poo"           

如果有办法处理(确定)令牌中的所有单词,将会很有帮助。

0 个答案:

没有答案