我有两个data.frames“a”和“b”。
str(a)
'data.frame': 1597 obs. of 2 variables:
$ id : int ...
$ age: num ...
> str(b)
'data.frame': 12877 obs. of 2 variables:
$ id : int ...
$ code : chr ...
虽然“id”在“a”中是唯一的,但它不在“b”中。更准确地说,“a”和“b”之间存在1:n的关系。我想检查“b”中是否有某个代码用于“a $ id”。我怎么能这样做?
我想,我需要这样的东西:
a$code.I25 <- ifelse(<if there is a$id in b$id and for b$id an entry with "I25" for b$code>, 1, 0)
不幸的是它有点复杂。 “b $ code”的值不仅像“I25”,而是像“I25.11”或“I25.12”。但是,我只是想比较“I25”并希望得到真正的“I25.11”和“I25.12”。这可能吗?
答案 0 :(得分:0)
这是一个例子
id_a = c(1, 2, 3, 23, 19, 11:13, 4, 6)
id_b = c(1, 2, 2, 5, 8, 11:13, 3, 3)
code_b = c(rep("I25", 4), rep("I26", 5), "I25")
a = data.frame(id = id_a, stringsAsFactors = FALSE)
a
# id
# 1 1
# 2 2
# 3 3
# 4 23
# 5 19
# 6 11
# 7 12
# 8 13
# 9 4
# 10 6
b = data.frame(id = id_b, code = code_b, stringsAsFactors = FALSE)
b
# id code
# 1 1 I25
# 2 2 I25
# 3 2 I25
# 4 5 I25
# 5 8 I26
# 6 11 I26
# 7 12 I26
# 8 13 I26
# 9 3 I26
# 10 3 I25
index = which(b$id %in% a$id)
b[index[which(b[index,]$code %in% "I25")],]
# id code
# 1 1 I25
# 2 2 I25
# 3 2 I25
# 10 3 I25
b[index[which(b[index,]$code %in% c("I25", "I26"))],]
# id code
# 1 1 I25
# 2 2 I25
# 3 2 I25
# 6 11 I26
# 7 12 I26
# 8 13 I26
# 9 3 I26
# 10 3 I25
#True |假
b$TF = rep(NA, nrow(b))
b$TF[index[which(b[index,]$code %in% c("I25", "I26"))]] <- 1
b$TF[-(index[which(b[index,]$code %in% c("I25", "I26"))])] <- 0
b
# id code TF
# 1 1 I25 1
# 2 2 I25 1
# 3 2 I25 1
# 4 5 I25 0
# 5 8 I26 0
# 6 11 I26 1
# 7 12 I26 1
# 8 13 I26 1
# 9 3 I26 1
# 10 3 I25 1
答案 1 :(得分:0)
#create a dummy data.frame for a
foo.a <- data.frame(id = 1:20,age = rnorm(20,25))
foo.b <-data.frame(id = 1:40,
code = as.character(paste(c("I25","I27"),1:20,sep = ".")))
#replicate it randomly
set.seed(357)
foo.b <-foo.b[sample(nrow(foo.b),75, replace = T),]
#check for matches
id.match <-which(foo.b$id %in% foo.a$id)
#get matching rows
foo.b[grep("I25",foo.b$code[id.match]),]