`row.names< - .data.frame`(`* tmp *`,value = c(NA_real_,NA_real_)出错

时间:2014-12-08 07:56:20

标签: r machine-learning

我正在尝试使用推文和极性构建模型。 但在中间我得到这个奇怪的错误: 在这一行:

analytics <- create_analytics(container, MAXENT_CLASSIFY)

我明白了

Error in `row.names<-.data.frame`(`*tmp*`, value = c(NA_real_, NA_real_,  : 
  duplicate 'row.names' are not allowed
In addition: Warning messages:
1: In cbind(labels, BEST_LABEL = as.numeric(best_labels), BEST_PROB = best_probs,  :
  NAs introduced by coercion
2: In create_documentSummary(container, score_summary) :
  NAs introduced by coercion
3: In cbind(MANUAL_CODE = testing_codes, CONSENSUS_CODE = scores$BEST_LABEL,  :
  NAs introduced by coercion
4: In create_topicSummary(container, score_summary) :
  NAs introduced by coercion
5: In cbind(TOPIC_CODE = as.numeric(as.vector(topic_codes)), NUM_MANUALLY_CODED = manually_coded,  :
  NAs introduced by coercion
6: In cbind(labels, BEST_LABEL = as.numeric(best_labels), BEST_PROB = best_probs,  :
  NAs introduced by coercion
7: non-unique values when setting 'row.names':

我的CSV文件如下:

text, polarity
Hello I forget the password of my credit card need to know how I can make my statement, neutral
can provide the swift code thanks, neutral
thanks just one more doubt has this card commissions with these characteristics, neutral
Thanks, neutral
are arriving mail scam, negative
can you help me I need to pay an online purchase and ask me for a terminal my debit which is, neutral
if I do not win anything this time I change banks, negative
you can be the next winner of the million that circumvents account award date January, neutral
account and see my accounts so I can have the, negative
thanks i just send the greetings consultation, neutral
may someday enable office not sick people, negative
hello is running payments through the online banking no, negative
thanks hope they do, neutral
should pay attention to many happened to us that your system flushed insufficient balance or had no money in the accounts, negative
yesterday someone had the dignity to answer the telephone banking and verify that the system is crap, negative
and tried but apparently the problem is just to pay movistar services, neutral
good morning was trying to pay for services through the website but get error retry in minutes, negative
if no system agent is non clients or customers also, positive

我使用的代码是:

library(RTextTools)

pg <- read.csv("cleened_tweets.csv", header=TRUE, row.names=NULL)

head(pg)

pgT <- as.factor(pg$text)

pgP <- as.factor(pg$polarity)

doc_matrix <- create_matrix(pgT, language="spanish", removeNumbers=TRUE, stemWords=TRUE, removeSparseTerms=.998)

dim(doc_matrix)

container <- create_container(doc_matrix, pgP, trainSize=1:275, testSize=276:375, virgin=FALSE)

MAXENT <- train_model(container,"MAXENT")

MAXENT_CLASSIFY <- classify_model(container, MAXENT)

analytics <- create_analytics(container, MAXENT_CLASSIFY)

summary(analytics)

2 个答案:

答案 0 :(得分:1)

我也遇到过RTextTools的错误。 create_analytics函数无法处理因子变量或字符串 - 仅限数字标签。我通常只是在运行此代码后将我的文本标签重新合并。

答案 1 :(得分:0)

将pgP变量从as.factor转换为as.numeric。这应该重新解决问题

pgP <- as.numeric(as.factor(pg$polarity))