Question

这里有一些R代码将foo与自身合并，并且有一天的延迟：

foo <- data.frame(user=c(10,10,10,11,11,11),
                  day=c(1,2,3,1,2,3),
                  something=c('a', 'b', 'c', 'd', 'e', 'f'))
foo$prev_day <- foo$day - 1
foo2 <- merge(foo, foo,
                by.x=c('user', 'day'),
                by.y=c('user', 'prev_day'))

#Warning message:
#In merge.data.frame(foo, foo, by.x = c("user", "day"), by.y = c("user",  :
#  column name ‘day’ is duplicated in the result

foo2

  user day something.x prev_day day something.y
1   10   1           a        0   2           b
2   10   2           b        1   3           c
3   11   1           d        0   2           e
4   11   2           e        1   3           f

请注意，它会在结果中抱怨并且“天”两次，但看起来非常好（每个用户只与自身合并）。

正确执行此操作的最简单方法是什么，即没有警告，只有第一个“日期”列，而不是结果中的第二个？

Answer 1

从第二个数据集中删除“day”列，以避免control.pop <- as.integer(population * (1 - split)变量与其他现有变量之间发生冲突。

round

自我合并数据帧与R中的滞后？

1 个答案: