按组匹配行值(r,data.table)

时间:2018-06-20 15:06:48

标签: r data.table match

如果这看起来过于抽象,我深表歉意。 我正面临这个问题。我这样的一些数据:

dt<-data.table(time=rep("3",5),record=c(1,2,3,4,4),type=c("A","B","B","A","A"),movement=c("Z","D","Z","Z","D"))

我的目标是基于recordmovement将类型B与类型A进行标记。

     time record type movement
1:    3      1    A        Z
2:    3      2    B        D
3:    3      3    B        Z
4:    3      4    A        Z
5:    3      4    A        D

逻辑如下:我们可以看到类型B(记录2)有一个D运动。我们看到D运动仅包含在记录4的A型运动中,而不包含在记录1的A类型运动中(仅Z运动)。在这种情况下,我需要将该B类型记录标记为1。另一个B类型记录的运动Z包含在记录1和4中。在这种情况下,我将其标记为0,如下所示

    time record type movement flag
1:    3      1    A        Z   
2:    3      2    B        D   1
3:    3      3    B        Z   0
4:    3      4    A        Z  
5:    3      4    A        D

我无法真正解决如何以简单的方式解决此问题。任何想法? 谢谢

2 个答案:

答案 0 :(得分:1)

以下内容如何:

library(data.table)
library(dplyr)
dt <- data.table(time=rep("3",5),
                 record=c(1,2,3,4,4),
                 type=c("A","B","B","A","A"),
                 movement=c("Z","D","Z","Z","D"))

# Count number of records by type and movement
grp.type_movement <- dt %>% group_by(type, movement)
dt.type_movement <- grp.type_movement %>% summarize( n=n() )

# Add the flag variable to input dataset
dt_with_flag <- merge( dt.type_movement %>% filter( type == "A"),
                       dt.type_movement %>% filter( type == "B" ),
                       by="movement", suffixes=c(".A", ".B") ) %>%
                  # Find A types with count = 1 and assign flag variable accordingly
                  mutate( flag=if_else( n.A == 1, 1, 0) ) %>%
                  # Select relevant variables for final merge with original dataset
                  select( type=type.B, movement, flag ) %>%
                  # Right merge with original dataset
                  merge( dt, by=c("type", "movement"), all.y=TRUE ) %>%
                  # Re-sort by record
                  arrange( record ) %>%
                  # Re-arrange the columns in the final dataset to their original order 
                  select( time, record, type, movement, flag)

请注意,最后一条命令中的初始mergemutate的结果是:

  movement type.A n.A type.B n.B flag
1        D      A   1      B   1    1
2        Z      A   2      B   1    0

最后一条命令的结果(从开始到结束)是:

  time record type movement flag
1    3      1    A        Z   NA
2    3      2    B        D    1
3    3      3    B        Z    0
4    3      4    A        D   NA
5    3      4    A        Z   NA

这就是您想要的。

但是我不知道您是否总是只有两个type值,或者您是否想将流程推广到更多type值?如果是后者,则type值之间的不对称定义是什么? (即,在您的示例中,type Btype A的角色不同...

答案 1 :(得分:0)

如果您要做的只是标记从B到A的明确记录匹配,这将起作用

library(data.table)

dt<-data.table(time=rep("3",5),record=c(1,2,3,4,4),type=c("A","B","B","A","A"),movement=c("Z","D","Z","Z","D"))

#group the A records
mat<-as.matrix(table(dt[type=="A",record], dt[type=="A",movement]))

#select which ones are unambiguous
unambiguous<-names(which(colSums(mat)==1))

#check them against the B records
dt[,flag:=ifelse(dt[,type]=="B" & dt[,movement] %in% unambiguous, 1, NA)]

dt[,flag:=ifelse(dt[,type]=="B" & !dt[,movement] %in% unambiguous, 0, dt[,flag])]

dt

#    time record type movement flag
# 1:    3      1    A        Z   NA
# 2:    3      2    B        D    1
# 3:    3      3    B        Z    0
# 4:    3      4    A        Z   NA
# 5:    3      4    A        D   NA

但是,如果您有其他情况,我想我们需要更多信息。