如何通过引用转换按位置索引的数据表列?

时间:2015-08-24 15:32:08

标签: r data.table

我有一个data.table,其中包含多列factor s。我想将最初读作factor s的2列转换为原始数值。这是我尝试过的:

  data[, c(4,5):=c(as.numeric(as.character(4)), as.numeric(as.character(5))), with=FALSE]

这给了我以下警告:

Warning messages:
1: In `[.data.table`(data, , `:=`(c(4, 5), c(as.numeric(as.character(4)),  :
  Supplied 2 items to be assigned to 7 items of column 'Bentley (R)' (recycled leaving remainder of 1 items).
2: In `[.data.table`(data, , `:=`(c(4, 5), c(as.numeric(as.character(4)),  :
  Supplied 2 items to be assigned to 7 items of column 'Sparks (D)' (recycled leaving remainder of 1 items).
3: In `[.data.table`(data, , `:=`(c(4, 5), c(as.numeric(as.character(4)),  :
  Coerced 'double' RHS to 'integer' to match the factor column's underlying type. Character columns are now recommended (can be in keys), or coerce RHS to integer or character first.
4: In `[.data.table`(data, , `:=`(c(4, 5), c(as.numeric(as.character(4)),  :
  Coerced 'double' RHS to 'integer' to match the factor column's underlying type. Character columns are now recommended (can be in keys), or coerce RHS to integer or character first.

此外,我可以告诉转换未成功,因为在此代码运行后第4和第5列仍然存在factor s。

作为替代方案,我尝试了这个代码,它根本不会运行:

 data[, ':=' (4=c(as.numeric(as.character(4)), 5 = as.numeric(as.character(5)))), with=FALSE]

最后,我尝试通过colnames

引用列名
  data[ , (colnames(data)[4]) := as.numeric(as.character(colnames(data)[4]))]

这会运行,但会产生一行NA以及以下错误:

Warning messages:
1: In eval(expr, envir, enclos) : NAs introduced by coercion
2: In `[.data.table`(data, , `:=`((colnames(data)[4]), as.numeric(as.character(colnames(data)[4])))) :
  Coerced 'double' RHS to 'integer' to match the factor column's underlying type. Character columns are now recommended (can be in keys), or coerce RHS to integer or character first.
3: In `[.data.table`(data, , `:=`((colnames(data)[4]), as.numeric(as.character(colnames(data)[4])))) :
  RHS contains -2147483648 which is outside the levels range ([1,6]) of column 1, NAs generated

我需要按位置而不是列名来执行此操作,因为列名将取决于URL。使用data.table按位置转换列的正确方法是什么?

我还有一个相关的查询,即如何相对于其他编号列转换编号列。例如,如果我想将第3列设置为等于45减去第3列的值加上第4列的值,我该怎么做?有没有办法区分真正的#与列号?我知道这样的事情不是要走的路:

dt[ , .(4) = 45 - .(3) + .(4), with = FALSE]

那么怎么办呢?

1 个答案:

答案 0 :(得分:5)

如果要通过引用和位置进行分配,则需要将列名称指定为字符向量,或将列号作为整数向量进行分配,并使用.SDcols(至少在data.table 1.9中) 0.4)。

首先是一个可重复的例子:

library(data.table)
DT <- data.table(iris)
DT[, c("Sepal.Length", "Petal.Length") := list(factor(Sepal.Length), factor(Petal.Length))]
str(DT)

现在让我们转换列:

DT[, names(DT)[c(1, 3)] := lapply(.SD, function(x) as.numeric(as.character(x))), 
   .SDcols = c(1, 3)]
str(DT)

可替换地:

DT[, c(1,3) := lapply(.SD, function(x) as.numeric(as.character(x))), .SDcols=c(1,3)]
str(DT)

请注意,:=需要左侧列名称或位置的向量以及右侧的列表。