当我尝试使用混合数据类型融合数据框时,我会获得NAs。我该如何最好地解决这个问题?

时间:2013-04-12 15:55:24

标签: r reshape

我的目标和背景

我在R中有一个数据框,我想使用reshape2库进行融合。有两个原因。

  1. 我想使用ggplot为条形图中的每个问题绘制每个用户的分数。

  2. 我想把这些数据放到Excel中,这样我就可以看到,每个用户,他们的情绪,得分,以及混合动机,态度,等等。我的意图是使用融化,然后施放数据到宽格式,便于Excel导入。

  3. 我的问题

    当我尝试运行熔化时,我收到一个警告,最终在我生成​​的熔融数据框中找到了NA。

    Warning messages:
    1: In `[<-.factor`(`*tmp*`, ri, value = c(0.148024, 0.244452, -0.00421,  :
    invalid factor level, NAs generated
    2: In `[<-.factor`(`*tmp*`, ri, value = c(0L, 0L, 0L, 0L, 0L, 0L, 0L,  :
    invalid factor level, NAs generated
    

    我最终在我的融合数据框中输入了大量的NA。我认为这是因为我在同一列中同时使用了字符和数字。

    我的问题

    结果我有两个问题。

    问题1:R中是否有解决方法?

    问题2:我是否有更好的方法来构建数据以避免此问题?

    代码

    这是我创建数据框的代码。

    words <- data.frame(read.delim("sentiments-test-subset-no-text.txt", header=FALSE))
    names(words) <- c("level", "question", "user", "sentiment", "score", "mixed")
    words$user <- as.factor(words$user)
    words.m <- melt(words, id.vars=c("user", "level"), measure.vars=c("sentiment", "score",     "mixed"))
    

    我很难重塑和融化,但我认为这就是我想要的最后一行。

    数据

    人类可读格式的数据如下所示。

    experimental    motivated   1   positive    0.148024    0
    experimental    motivated   2   positive    0.244452    0
    experimental    motivated   3   negative       -0.004210    0
    experimental    motivated   4   unknown         0.000000    0
    experimental    attitudeBefore  1   negative       -0.241500    0
    experimental    attitudeBefore  2   neutral         0.000000    0
    experimental    attitudeBefore  3   neutral         0.000000    0
    experimental    attitudeBefore  4   unknown         0.000000    0
    

    dput dump

    以下输入。

    structure(list(level = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L), .Label = "experimental", class = "factor"), question = structure(c(2L, 
    2L, 2L, 2L, 1L, 1L, 1L, 1L), .Label = c("attitudeBefore", "motivated"
    ), class = "factor"), user = structure(c(1L, 2L, 3L, 4L, 1L, 
    2L, 3L, 4L), .Label = c("1", "2", "3", "4"), class = "factor"), 
    sentiment = structure(c(3L, 3L, 1L, 4L, 1L, 2L, 2L, 4L), .Label = c("negative", 
    "neutral", "positive", "unknown"), class = "factor"), score = c(0.148024, 
    0.244452, -0.00421, 0, -0.2415, 0, 0, 0), mixed = c(0L, 0L, 
    0L, 0L, 0L, 0L, 0L, 0L)), .Names = c("level", "question", 
    "user", "sentiment", "score", "mixed"), row.names = c(NA, -8L
    ), class = "data.frame")
    

1 个答案:

答案 0 :(得分:4)

看起来您可能只是使用了错误的库。 reshapereshape2不是一回事。

library(reshape2)
words.m <- melt(words, id.vars=c("user", "level"), measure.vars=c("sentiment", "score",     "mixed"))
# no problem

detach(package:reshape2)

# using reshape instead of reshape2
library(reshape)
words.m <- melt(words, id.vars=c("user", "level"), measure.vars=c("sentiment", "score",     "mixed"))
# Warning messages:
# 1: In `[<-.factor`(`*tmp*`, ri, value = c(3L, 3L, 1L, 4L, 1L, 2L, 2L,  :
#   invalid factor level, NAs generated
# 2: In `[<-.factor`(`*tmp*`, ri, value = c(3L, 3L, 1L, 4L, 1L, 2L, 2L,  :
#   invalid factor level, NAs generated

如果您的系统上没有reshape2,则可以从CRAN安装

 install.packages("reshape2")