根据条件删除熔体数据

时间:2015-12-27 22:04:33

标签: r data.table

我想删除value a = b的{​​{1}}行,但我不知道该怎么做。

示例数据:

df <- data.frame(day = c(1, 1, 2, 2, 3, 3), var = c("a", "b", "a", "b", "a", "b"), value = c(1, 2, 3, 3, 2, 1)

输出:

  day var value
1   1   a     1
2   1   b     2
3   2   a     3
4   2   b     3
5   3   a     2
6   3   b     1

期望的输出:

  day var value
1   1   a     1
2   1   b     2

2 个答案:

答案 0 :(得分:3)

这里是一个数据表解决方案,用于避免从长到宽:

dt <- data.table(df)
dt[,if(value[var == 'a'] >= value[var == 'b']) .SD,by = day]
编辑:我现在意识到你想要的输出不符合你的初始不等式,所以调整不等式来匹配:)

EDIT2:如果您不想在data.table中执行此操作,那么这里是dplyr解决方案

df %>% group_by(day) %>% filter(value[var == 'a'] >= value[var == 'b'])

EDIT3:如果你想把NA&#39; s放在这个

df %>% group_by(day) %>% mutate(value = if(value[var == 'a'] >= value[var == 'b']) as.numeric(NA) else value) 

EDIT4:注意这最后一个解决方案似乎暴露了一个错误,其中NA的处理方式很奇怪,请参见此处:Why is dplyr removing values not met by condition?

答案 1 :(得分:3)

Shape的答案是解决问题的正确方法 为了扩展Shape的答案,我想通过更通用的解决方案做出贡献 包eav中的dwtools功能旨在通过更轻松地计算度量来解决Entity-attribute-value数据结构问题。功能定义如下,您不需要dwtools包 它为每个组计算rm变量。计算公式可以与熔化您的EAV之后引用j arg到[.data.table之后,以及再次转换为EAV之前的引用相同。

library(data.table)
eav = function(x, j, id.vars = key(x)[-length(key(x))], variable.name = key(x)[length(key(x))], measure.vars = names(x)[!(names(x) %in% key(x))], fun.aggregate = sum, shift.on = character(), wide=FALSE){
    stopifnot(is.data.table(x))
    r <- x[,lapply(.SD,fun.aggregate),c(id.vars,variable.name),.SDcols=measure.vars
           ][,dcast(.SD,formula=as.formula(paste(paste(id.vars,collapse=' + '),paste(variable.name,collapse=' + '),sep=' ~ ')),fun.aggregate=fun.aggregate,value.var=measure.vars)
             ][,eval(j), by = eval(id.vars[!(id.vars %in% shift.on)])
               ]
    if(wide) r[] else melt(r,id.vars=id.vars, variable.name=variable.name, value.name=measure.vars)[,.SD,keyby=c(id.vars,variable.name)]
}

df = data.frame(day = c(1, 1, 2, 2, 3, 3), var = c("a", "b", "a", "b", "a", "b"), value = c(1, 2, 3, 3, 2, 1))
dt = as.data.table(df)
setkey(dt, day, var)
r = eav(dt, quote(rm := as.numeric(a >= b)))
print(r)
#   day var value
#1:   1   a     1
#2:   1   b     2
#3:   1  rm     0
#4:   2   a     3
#5:   2   b     3
#6:   2  rm     1
#7:   3   a     2
#8:   3   b     1
#9:   3  rm     1
r[, if(value[var=="rm"] == 0) .SD, by = day
  ][var!="rm"] # you need to exclude temporary variable
#   day var value
#1:   1   a     1
#2:   1   b     2

此解决方案也可能比Shape更慢(您可以填充大数据样本以便对其进行测量),但对于EAV中的许多度量的复杂计算可能更容易,并且支持移位 - 请参阅{{3 }}。