数据表分配不起作用

时间:2015-05-02 09:16:33

标签: r data.table variable-assignment

好的,我正在清理大型数据集,并尝试通过将数据框代码更改为数据表来加快速度。我遇到了缺失值代码的条件分配问题。玩具示例:

    X = data.table(grp=c("a","a","b","b","b","c","c","d","d","d","d"), 
    foo=c(1:4,NA,6:7,NA,8:10))
    setkey(X,grp)
    err.code <-"1111"
    row.select <- row.names(X)[X$grp=="b" & is.na(X$foo)]

    # Replace missing value for group b with err.code
    X[row.select, foo:=err.code]

所以我想把err.code放到符合条件的特定单元格中。然而上面没有分配任何东西。 e.g。

    > X
        grp foo
     1:   a   1
     2:   a   2
     3:   b   3
     4:   b   4
     5:   b  NA
     6:   c   6
     7:   c   7
     8:   d  NA
     9:   d   8
    10:   d   9
    11:   d  10

我在这里缺少什么?

1 个答案:

答案 0 :(得分:4)

我看到两个问题:

  1. 您正尝试使用字符替换数字列中的值。除非您明确地将列类型转换为彼此匹配,否则data.table不喜欢这样。
  2. 您尝试使用字符值“5”而非数字值5来索引该行。
  3. 因此,以下内容应该有效:

    err.code <- 1111
    row.select <- as.numeric(row.names(X)[X$grp=="b" & is.na(X$foo)])
    X[row.select, foo := err.code][]
    #     grp  foo
    #  1:   a    1
    #  2:   a    2
    #  3:   b    3
    #  4:   b    4
    #  5:   b 1111
    #  6:   c    6
    #  7:   c    7
    #  8:   d   NA
    #  9:   d    8
    # 10:   d    9
    # 11:   d   10
    

    或者,不创建那些额外的变量:

    X[grp == "b" & is.na(foo), foo := 1111]
    

    如果您认为不同的列类型是问题,则需要先显式转换它们:

    err.code <- "1111"
    row.select <- as.numeric(row.names(X)[X$grp=="b" & is.na(X$foo)])
    X[, foo := as.character(foo)][row.select, foo := err.code][]
    #     grp  foo
    #  1:   a    1
    #  2:   a    2
    #  3:   b    3
    #  4:   b    4
    #  5:   b 1111
    #  6:   c    6
    #  7:   c    7
    #  8:   d   NA
    #  9:   d    8
    # 10:   d    9
    # 11:   d   10
    str(.Last.value)
    # Classes ‘data.table’ and 'data.frame':    11 obs. of  2 variables:
    # $ grp: chr  "a" "a" "b" "b" ...
    # $ foo: chr  "1" "2" "3" "4" ...
    # - attr(*, ".internal.selfref")=<externalptr> 
    # - attr(*, "sorted")= chr "grp"