提前感谢您提供的任何和所有帮助。 我有一个相对较大的数据集,我想测试每个sting是否存在于从较大数据集创建的一系列子集数据帧中。 我能够通过三个步骤完成此任务,但我想编写一段代码,只需一步即可完成。
由于我想要的文件大小 创建子文件t2.a用于在我的文件t1中添加1或0,删除它; 然后重复t2.b,t2.c ...
的过程再次感谢
我的实际数据集类似于下面的数据帧。
t1<- data.frame ( A1 = c("red", "blue", "green", "yellow", "brown"),
A2 = c("orange", "purple", "yellow", "black", NA),
A3 = c(1,2,4,5,7))
t2<- data.frame(B2 = c("black", "pink", "lime", "green", "grey", "mist", "blond", "grass", "violet", "red"),
B3 = c("a", "b", "a", "c", "d", "d", "a" , "c", "a", "b"))
> t1
A1 A2 A3
1 red orange 1
2 blue purple 2
3 green yellow 4
4 yellow black 5
5 brown <NA> 7
> t2
B2 B3
1 black a
2 pink b
3 lime a
4 green c
5 grey d
6 mist d
7 blond a
8 grass c
9 violet a
10 red b
#我现有的代码是三个步骤
# step 1. creates a subset of files
for(i in unique(t2$B3)) {
colName <- paste("t2", i, sep = ".")
assign(colName, t2[t2$B3==i,])
}
# step2. find if string exist in a given subfile
t1$t2.a<- ifelse(t1$A1 %in% t2.a$B2|t1$A2 %in% t2.a$B2,1,0)
#
t1$t2.b<- ifelse(t1$A1 %in% t2.b$B2|t1$A2 %in% t2.b$B2,1,0)
#
t1$t2.c<- ifelse(t1$A1 %in% t2.c$B2|t1$A2 %in% t2.c$B2,1,0)
#
t1$t2.d<- ifelse(t1$A1 %in% t2.d$B2|t1$A2 %in% t2.d$B2,1,0)
# 3.remove each newly created data set
rm(t2.a)
rm(t2.b)
rm(t2.c)
rm(t2.d)
#The result should look like the dataframe below
A1 A2 A3 t2.a t2.b t2.c t2.d
1 red orange 1 0 1 0 0
2 blue purple 2 0 0 0 0
3 green yellow 4 0 0 1 0
4 yellow black 5 1 0 0 0
5 brown <NA> 7 0 0 0 0
答案 0 :(得分:0)
我认为您正在打印到屏幕上的test2
可能会有一些问题:
>test2 #OP's test2 printed
p1 p2 oi NC
1 jaes jelly 1 1
2 tommy joe 2 1
3 NA Joe 3 1
4 eleder NA 4 0
5 food A 5 0
6 jelly jelly 6 1
根据您生成的数据,我觉得第三行存在差异。
> test2 #test2 based on the provided data
p1 p2 oi
1 jaes jelly 1
2 tommy joe 2
3 joe NA 3
4 eleder NA 4
5 food A 5
6 jelly jelly 6
此外,我在joe
NM
== D
相对应的gt
test1
test1[test1$NM == "joe", ]
D NM
4 bk joe
12 oo joe
无论如何,我允许p2
贡献的过于复杂的解决方案如下。
test2$NC <- ifelse(test2$p1 %in% test1$NM & test2$p2 %in% test1$NM,
ifelse(any(test1$D[which(test1$NM %in% test2$p1)] == "gt") |
any(test1$D[which(test1$NM %in% test2$p2)] == "gt"), 1, 0), 0)
> test2
p1 p2 oi NC
1 jaes jelly 1 1
2 tommy joe 2 1
3 joe NA 3 0
4 eleder NA 4 0
5 food A 5 0
6 jelly jelly 6 1
注意,由于上面提到的问题,这在第三行中与您的预期输出不一致。