应用错误收集

如何引用不属于SD的lapply列？

时间：2014-06-17 11:56:46

标签： r data.table

我的data.table中有一列，其中包含我想用来更新其他一些列的数据。这个数据是一个列表，我需要根据我在SD表达式中包含的每一列中的值来对列表进行子集化

我的数据....

dt <- data.table( A = list( c("X","Y") , c("J","K") ) , B = c(1,2) , C = c(2,1) )
#     A B C
#1: X,Y 1 2
#2: J,K 2 1

我想要的结果......

#     A B C
#1: X,Y X Y
#2: J,K K J

我尝试了什么....

# Column A is not included in SD so not found...
dt[ , lapply( .SD , function(x) A[x] ) , .SDcols = 2:3 ]
#Error in FUN(X[[1L]], ...) : object 'A' not found


# This also does not work. See's all of A as one long vector (look at results for C)
for( i in 2:3 ) dt[ , names(dt)[i] := unlist(A)[ get(names(dt)[i]) ] ]
#     A B C
#1: X,Y X Y
#2: J,K Y X

# I saw this in another answer, but also won't work:
# Basically we add an ID column and use 'by=' to try and solve the problem  above
# Now we get a type mismatch
dt <- data.table( ID = 1:2 , A = list( c("X","Y") , c("J","K") ) , B = c(1,2) , C = c(2,1) , key = "ID" )
for( i in 3:4 ) dt[ , names(dt)[i] := unlist(A)[ get(names(dt)[i]) ] , by = ID ]
#Error in `[.data.table`(dt, , `:=`(names(dt)[i], unlist(A)[get(names(dt)[i])]),  : 
#  Type of RHS ('character') must match LHS ('double'). To check and coerce would impact performance too much for the fastest cases. Either change the type of the target column, or coerce the RHS of := yourself (e.g. by using 1L instead of 1)

如果有人感兴趣，我的真实数据是不同隔离区的一组SNP和INDELS，我正在尝试这样做：

# My real data looks more like this:
# In columns V10:V15;
# if '.' in first character then use data from 'Ref' column
# else use integer at first character to subset list in 'Alt' column
#   Contig  Pos V3 Ref Alt    Qual        V10       V11       V12       V13       V14       V15
#1:     1   172  .   T   C 81.0000  1/1:.:.:. ./.:.:.:. ./.:.:.:. ./.:.:.:. ./.:.:.:. ./.:.:.:.
#2:     1   399  .   G C,A 51.0000  ./.:.:.:. 1/1:.:.:. 2/2:.:.:. ./.:.:.:. 1/1:.:.:. ./.:.:.:.
#3:     1   516  .   T   G 57.0000  ./.:.:.:. 1/1:.:.:. ./.:.:.:. 1/1:.:.:. ./.:.:.:. ./.:.:.:.

3 个答案:

答案 0 :(得分：4)

您可以将mapply和set与for循环一起使用。可能有更有效的方法

for(j in c('B','C')){
    set(dt, j = j, value = mapply(FUN = '[', dt[['A']],dt[[j]]))
}
 dt
#      A B C
# 1: X,Y X Y
# 2: J,K K J

答案 1 :(得分：1)

嗨这对你有用吗？

dt$B <- apply(dt, 1, FUN = function(x) x$A[x$B])
dt$C <- apply(dt, 1, FUN = function(x) x$A[x$C])
dt
#     A B C
#1: X,Y X Y
#2: J,K K J

答案 2 :(得分：0)

这可能是一种更优雅的方式来做到这一点并且它不能很好地扩展但是这里......

dt[,A1:=lapply(A,'[[',1)]
dt[,A2:=lapply(A,'[[',2)]
dt[B==1,`:=`(Bnew=A1,Cnew=A2)]
dt[B==2,`:=`(Bnew=A2,Cnew=A1)]
dt[,`:=`(A1=NULL,A2=NULL,B=NULL,C=NULL)]
setnames(dt,c("Bnew","Cnew"),c("B","C"))