我想记录data.frame中的更改:
在original
数据框中,我有以下结构:
library(dplyr)
library(compare)
origianl <- data_frame( name = c('John','Tim','Allan','Mitch'),
A = c(0,1,1,0),
B = c(1,0,0,1),
C = c(0,0,0,0),
D = c(1,0,0,1))
#> data.frame(origianl)
name A B C D
1 John 0 1 0 1
2 Tim 1 0 0 0
3 Allan 1 0 0 0
4 Mitch 0 1 0 1
这里约翰扮演B和D。
在changed
数据框中,我有以下结构:
changed <- data_frame( name = c('John','Tim','Allan','Mitch'),
A = c(1,0,1,0),
B = c(0,0,0,0),
C = c(0,1,0,1),
D = c(1,0,0,1))
#> data.frame(changed)
name A B C D
1 John 1 0 0 1
2 Tim 0 0 1 0
3 Allan 1 0 0 0
4 Mitch 0 0 1 1
例如,John改变了球队,现在效力于A和D.
有几个限制因素:
我每天都会检查original
。玩家可以为多个团队或一个团队玩,他们可以切换团队。他们被分配到至少一个团队。
我试过的是编写一个函数,记录两个特定列中的更改when
from
所需的输出表可能如下所示,基本上是changed
,其中有两个额外的列记录了更改:
#> data.frame(new_original)
name A B C D from when
1 John 1 0 0 1 B 2017-01-01
2 Tim 0 0 1 0 A 2017-01-01
3 Allan 1 0 0 0 NA NA
4 Mitch 0 0 1 1 B 2017-01-01
到目前为止我尝试过:
my_func <- function(origianl, changed) {
# Get the difference
difference <- anti_join(changed, origianl)
# find the changes in the original data
differ <- origianl[origianl$name %in% difference$name, ]
# check whether condition is satisfyied
if (nrow(differ) == nrow(difference) &&
nrow(anti_join(changed, origianl)) > 0)
{
new_original <-
cbind(changed, from = names(which(
compare(difference[2:ncol(difference)], differ[2:ncol(differ)], allowAll =
TRUE)$detailedResult == FALSE, when = Sys.Date()
)))
}
return(new_original)
}
new_original = my_func(origianl, changed)
哪个会产生arguments imply differing number of rows: 4, 3
来自cbind命令。不确定如何获得理想的输出
答案 0 :(得分:4)
这是一个使用基数R的可能的矢量化方法。您还没有指定如何计算when
列(这只是当前日期?),但这里是from
非常容易计算的方式< / p>
indx <- which((origianl[-1] - changed[-1]) == 1, arr.ind = TRUE)
changed[indx[, "row"], "from"] <- names(changed)[-1][indx[, "col"]]
changed
# name A B C D from
# 1 John 1 0 0 1 B
# 2 Tim 0 0 1 0 A
# 3 Allan 1 0 0 0 <NA>
# 4 Mitch 0 0 1 1 B
答案 1 :(得分:1)
假设when
选择了Sys.time()
,那么,
library(dplyr)
f1 <- function(x, y){
d4 <-left_join(y, x, by = 'name')
d5 <- d4[grepl('.x', names(d4), fixed = TRUE)]
d6 <- d4[grepl('.y', names(d4), fixed = TRUE)]
l1 <- apply(d5 - d6, 1, function(i) names(i)[i == -1])
y$from <- sub('\\..*', '', unlist(ifelse(lengths(l1) == 0, NA, l1)))
y$when <- Sys.time()
y$when[is.na(y$from)] <- NA
return(y)
}
f1(origianl, changed)
# name A B C D from when
#1 John 1 0 0 1 B 2017-03-14 15:37:59
#2 Tim 0 0 1 0 A 2017-03-14 15:37:59
#3 Allan 1 0 0 0 <NA> <NA>
#4 Mitch 0 0 1 1 B 2017-03-14 15:37:59
答案 2 :(得分:0)
我不确定你是怎么得到的,但这应该适用于:
origianl <- as.data.frame(origianl)
id <- sapply(1:nrow(origianl), function(x) i[which( origianl[x,]==1)][!(i[which( origianl[x,]==1)] %in% i[which(changed[x,]==1)])])
origianl$from <- sapply(id,function(x) ifelse(length(x)==0,NA,x))