创建一个新列,并根据提供的条件输入1或0

时间:2016-12-06 02:15:07

标签: r loops if-statement match

提前感谢您提供的任何和所有帮助。 我有一个相对较大的数据集,我想测试每个sting是否存在于从较大数据集创建的一系列子集数据帧中。 我能够通过三个步骤完成此任务,但我想编写一段代码,只需一步即可完成。

由于我想要的文件大小 创建子文件t2.a用于在我的文件t1中添加1或0,删除它; 然后重复t2.b,t2.c ...

的过程

再次感谢

我的实际数据集类似于下面的数据帧。

    t1<- data.frame ( A1 = c("red", "blue", "green", "yellow", "brown"),
                     A2 = c("orange", "purple", "yellow", "black", NA),
                     A3 = c(1,2,4,5,7))

    t2<- data.frame(B2 = c("black", "pink", "lime", "green", "grey", "mist", "blond", "grass", "violet", "red"),
                    B3 = c("a", "b", "a", "c", "d", "d", "a" , "c", "a", "b"))

    > t1
          A1     A2 A3
    1    red orange  1
    2   blue purple  2
    3  green yellow  4
    4 yellow  black  5
    5  brown   <NA>  7
    > t2
           B2 B3
    1   black  a
    2    pink  b
    3    lime  a
    4   green  c
    5    grey  d
    6    mist  d
    7   blond  a
    8   grass  c
    9  violet  a
    10    red  b

#我现有的代码是三个步骤

    # step 1. creates a subset of files 
      for(i in unique(t2$B3)) {
        colName <- paste("t2", i, sep = ".")
        assign(colName, t2[t2$B3==i,])

      }

    # step2. find if string exist in a given subfile
    t1$t2.a<- ifelse(t1$A1 %in% t2.a$B2|t1$A2 %in% t2.a$B2,1,0)
    #
    t1$t2.b<- ifelse(t1$A1 %in% t2.b$B2|t1$A2 %in% t2.b$B2,1,0)
    #
    t1$t2.c<- ifelse(t1$A1 %in% t2.c$B2|t1$A2 %in% t2.c$B2,1,0)
    #
    t1$t2.d<- ifelse(t1$A1 %in% t2.d$B2|t1$A2 %in% t2.d$B2,1,0)

    # 3.remove each newly created data set 
    rm(t2.a)
    rm(t2.b)
    rm(t2.c)
    rm(t2.d) 

    #The result should look like the dataframe below 
       A1     A2 A3 t2.a t2.b t2.c t2.d
    1    red orange  1    0    1    0    0
    2   blue purple  2    0    0    0    0
    3  green yellow  4    0    0    1    0
    4 yellow  black  5    1    0    0    0
    5  brown   <NA>  7    0    0    0    0

1 个答案:

答案 0 :(得分:0)

我认为您正在打印到屏幕上的test2可能会有一些问题:

>test2 #OP's test2 printed
      p1    p2 oi NC
1   jaes jelly  1  1
2  tommy   joe  2  1
3    NA    Joe  3  1
4 eleder    NA  4  0
5   food     A  5  0
6  jelly jelly  6  1

根据您生成的数据,我觉得第三行存在差异。

> test2 #test2 based on the provided data
      p1    p2 oi
1   jaes jelly  1
2  tommy   joe  2
3    joe    NA  3
4 eleder    NA  4
5   food     A  5
6  jelly jelly  6

此外,我在joe

中的任何地方都看不到与NM == D相对应的gt test1
test1[test1$NM == "joe", ]
    D  NM
4  bk joe
12 oo joe

无论如何,我允许p2贡献的过于复杂的解决方案如下。

test2$NC <- ifelse(test2$p1 %in% test1$NM & test2$p2 %in% test1$NM, 
                   ifelse(any(test1$D[which(test1$NM %in% test2$p1)] == "gt") |
                            any(test1$D[which(test1$NM %in% test2$p2)] == "gt"), 1, 0), 0)

> test2
      p1    p2 oi NC
1   jaes jelly  1  1
2  tommy   joe  2  1
3    joe    NA  3  0
4 eleder    NA  4  0
5   food     A  5  0
6  jelly jelly  6  1

注意,由于上面提到的问题,这在第三行中与您的预期输出不一致。