Question

如果这看起来过于抽象，我深表歉意。我正面临这个问题。我这样的一些数据：

dt<-data.table(time=rep("3",5),record=c(1,2,3,4,4),type=c("A","B","B","A","A"),movement=c("Z","D","Z","Z","D"))

我的目标是基于record和movement将类型B与类型A进行标记。

     time record type movement
1:    3      1    A        Z
2:    3      2    B        D
3:    3      3    B        Z
4:    3      4    A        Z
5:    3      4    A        D

逻辑如下：我们可以看到类型B（记录2）有一个D运动。我们看到D运动仅包含在记录4的A型运动中，而不包含在记录1的A类型运动中（仅Z运动）。在这种情况下，我需要将该B类型记录标记为1。另一个B类型记录的运动Z包含在记录1和4中。在这种情况下，我将其标记为0，如下所示

    time record type movement flag
1:    3      1    A        Z   
2:    3      2    B        D   1
3:    3      3    B        Z   0
4:    3      4    A        Z  
5:    3      4    A        D

我无法真正解决如何以简单的方式解决此问题。任何想法？谢谢

Answer 1

以下内容如何：

library(data.table)
library(dplyr)
dt <- data.table(time=rep("3",5),
                 record=c(1,2,3,4,4),
                 type=c("A","B","B","A","A"),
                 movement=c("Z","D","Z","Z","D"))

# Count number of records by type and movement
grp.type_movement <- dt %>% group_by(type, movement)
dt.type_movement <- grp.type_movement %>% summarize( n=n() )

# Add the flag variable to input dataset
dt_with_flag <- merge( dt.type_movement %>% filter( type == "A"),
                       dt.type_movement %>% filter( type == "B" ),
                       by="movement", suffixes=c(".A", ".B") ) %>%
                  # Find A types with count = 1 and assign flag variable accordingly
                  mutate( flag=if_else( n.A == 1, 1, 0) ) %>%
                  # Select relevant variables for final merge with original dataset
                  select( type=type.B, movement, flag ) %>%
                  # Right merge with original dataset
                  merge( dt, by=c("type", "movement"), all.y=TRUE ) %>%
                  # Re-sort by record
                  arrange( record ) %>%
                  # Re-arrange the columns in the final dataset to their original order 
                  select( time, record, type, movement, flag)

请注意，最后一条命令中的初始merge和mutate的结果是：

  movement type.A n.A type.B n.B flag
1        D      A   1      B   1    1
2        Z      A   2      B   1    0

最后一条命令的结果（从开始到结束）是：

  time record type movement flag
1    3      1    A        Z   NA
2    3      2    B        D    1
3    3      3    B        Z    0
4    3      4    A        D   NA
5    3      4    A        Z   NA

这就是您想要的。

但是我不知道您是否总是只有两个type值，或者您是否想将流程推广到更多type值？如果是后者，则type值之间的不对称定义是什么？（即，在您的示例中，type B与type A的角色不同...

Answer 2

如果您要做的只是标记从B到A的明确记录匹配，这将起作用

library(data.table)

dt<-data.table(time=rep("3",5),record=c(1,2,3,4,4),type=c("A","B","B","A","A"),movement=c("Z","D","Z","Z","D"))

#group the A records
mat<-as.matrix(table(dt[type=="A",record], dt[type=="A",movement]))

#select which ones are unambiguous
unambiguous<-names(which(colSums(mat)==1))

#check them against the B records
dt[,flag:=ifelse(dt[,type]=="B" & dt[,movement] %in% unambiguous, 1, NA)]

dt[,flag:=ifelse(dt[,type]=="B" & !dt[,movement] %in% unambiguous, 0, dt[,flag])]

dt

#    time record type movement flag
# 1:    3      1    A        Z   NA
# 2:    3      2    B        D    1
# 3:    3      3    B        Z    0
# 4:    3      4    A        Z   NA
# 5:    3      4    A        D   NA

但是，如果您有其他情况，我想我们需要更多信息。

按组匹配行值（r，data.table）

2 个答案: