如果这看起来过于抽象,我深表歉意。 我正面临这个问题。我这样的一些数据:
dt<-data.table(time=rep("3",5),record=c(1,2,3,4,4),type=c("A","B","B","A","A"),movement=c("Z","D","Z","Z","D"))
我的目标是基于record
和movement
将类型B与类型A进行标记。
time record type movement
1: 3 1 A Z
2: 3 2 B D
3: 3 3 B Z
4: 3 4 A Z
5: 3 4 A D
逻辑如下:我们可以看到类型B(记录2)有一个D运动。我们看到D运动仅包含在记录4的A型运动中,而不包含在记录1的A类型运动中(仅Z运动)。在这种情况下,我需要将该B类型记录标记为1。另一个B类型记录的运动Z包含在记录1和4中。在这种情况下,我将其标记为0,如下所示
time record type movement flag
1: 3 1 A Z
2: 3 2 B D 1
3: 3 3 B Z 0
4: 3 4 A Z
5: 3 4 A D
我无法真正解决如何以简单的方式解决此问题。任何想法? 谢谢
答案 0 :(得分:1)
以下内容如何:
library(data.table)
library(dplyr)
dt <- data.table(time=rep("3",5),
record=c(1,2,3,4,4),
type=c("A","B","B","A","A"),
movement=c("Z","D","Z","Z","D"))
# Count number of records by type and movement
grp.type_movement <- dt %>% group_by(type, movement)
dt.type_movement <- grp.type_movement %>% summarize( n=n() )
# Add the flag variable to input dataset
dt_with_flag <- merge( dt.type_movement %>% filter( type == "A"),
dt.type_movement %>% filter( type == "B" ),
by="movement", suffixes=c(".A", ".B") ) %>%
# Find A types with count = 1 and assign flag variable accordingly
mutate( flag=if_else( n.A == 1, 1, 0) ) %>%
# Select relevant variables for final merge with original dataset
select( type=type.B, movement, flag ) %>%
# Right merge with original dataset
merge( dt, by=c("type", "movement"), all.y=TRUE ) %>%
# Re-sort by record
arrange( record ) %>%
# Re-arrange the columns in the final dataset to their original order
select( time, record, type, movement, flag)
请注意,最后一条命令中的初始merge
和mutate
的结果是:
movement type.A n.A type.B n.B flag
1 D A 1 B 1 1
2 Z A 2 B 1 0
最后一条命令的结果(从开始到结束)是:
time record type movement flag
1 3 1 A Z NA
2 3 2 B D 1
3 3 3 B Z 0
4 3 4 A D NA
5 3 4 A Z NA
这就是您想要的。
但是我不知道您是否总是只有两个type
值,或者您是否想将流程推广到更多type
值?如果是后者,则type
值之间的不对称定义是什么? (即,在您的示例中,type B
与type A
的角色不同...
答案 1 :(得分:0)
如果您要做的只是标记从B到A的明确记录匹配,这将起作用
library(data.table)
dt<-data.table(time=rep("3",5),record=c(1,2,3,4,4),type=c("A","B","B","A","A"),movement=c("Z","D","Z","Z","D"))
#group the A records
mat<-as.matrix(table(dt[type=="A",record], dt[type=="A",movement]))
#select which ones are unambiguous
unambiguous<-names(which(colSums(mat)==1))
#check them against the B records
dt[,flag:=ifelse(dt[,type]=="B" & dt[,movement] %in% unambiguous, 1, NA)]
dt[,flag:=ifelse(dt[,type]=="B" & !dt[,movement] %in% unambiguous, 0, dt[,flag])]
dt
# time record type movement flag
# 1: 3 1 A Z NA
# 2: 3 2 B D 1
# 3: 3 3 B Z 0
# 4: 3 4 A Z NA
# 5: 3 4 A D NA
但是,如果您有其他情况,我想我们需要更多信息。