我运行以下命令以导入标记的数据:
flattenlist <- function(x){
morelists <- sapply(x, function(xprime) {
'list' %in% class(xprime) & !('gg' %in% class(xprime))
})
out <- c(x[!morelists], unlist(x[morelists], recursive=FALSE))
if(sum(morelists)){
Recall(out)
}else{
return(out)
}
}
plts <- flattenlist(summary_plots)
names(plts)
[1] "Demographics.Age"
[2] "Product_Usage.Purchase_Frequency"
[3] "Demographics.Socioeconomic.Income"
[4] "Demographics.Socioeconomic.Education"
lapply(plts, class)
$Demographics.Age
[1] "gg" "ggplot"
$Product_Usage.Purchase_Frequency
[1] "gg" "ggplot"
$Demographics.Socioeconomic.Income
[1] "gg" "ggplot"
$Demographics.Socioeconomic.Education
[1] "gg" "ggplot"
然后,我使用以下代码生成与给定标签相关联的单词:
bin/mallet import-file --input training.in --output training.out --stoplist-file stop-words.txt --label-as-features --keep-sequence --line-regex '([^\t]+)\t([^\t]+)\t(.*)'
我已经看过使用分类与
mallet/bin/mallet run cc.mallet.topics.LabeledLDA --input training.out --output-topic-keys topic-llda.keys
http://mallet.cs.umass.edu/classification.php
我可以使用这个吗?如果可以,怎么办?如果不能,我可以编辑LabeledLDA.java文件进行10倍交叉验证吗?