在数据帧的每一行中选择2个相应的列

时间:2013-04-25 18:02:53

标签: r row dataframe vectorization

我道歉,因为这似乎是一个基本问题,但我一直在寻找更好的解决方案,但还没有找到它。我有以下类型的数据。

myDATA<-data.frame(rbind(c("red","blue","green", "dog","hat","cat")
                     ,c("blue","green", "blue","dog","hat","cat")
                     ,c("green","blue","blue","dog","hat","cat")
                     ,c("green","red", "blue","dog","hat","cat")
                     )
               )
names(myDATA)<-c(paste("Color",1:3,sep=""),paste("Stim",1:3,sep=""))
myDATA$greenImage<-NA

给出了:

MYDATA

+-----------------------------------------------------+
|   Color1 Color2 Color3 Stim1 Stim2 Stim3 greenImage |
+-----------------------------------------------------+
| 1    red   blue  green   dog   hat   cat         NA |
| 2   blue  green   blue   dog   hat   cat         NA |
| 3  green   blue   blue   dog   hat   cat         NA |
| 4  green    red   blue   dog   hat   cat         NA |
+-----------------------------------------------------+

Color列与Stim列按编号对应,例如,Stim1显示在Color1中,依此类推。对于每一行,一个Stim以绿色显示。我想找到Stim并保存在名为greenImage的新列中。

我从许多帖子中收集apply()可能在这里有用,但我无法使其发挥作用。我相当不优雅的解决方案是下面表格的循环,

for (i in 1:nrow(myDATA)) {
  x <- match("green", unlist(myDATA[i,paste("Color", 1:3, sep="")]))
  myDATA[i,"greenImage"] <- as.character(myDATA[i, paste("Stim", x, sep="")])
}

导致:

myDATA
+-----------------------------------------------------+
|   Color1 Color2 Color3 Stim1 Stim2 Stim3 greenImage |
+-----------------------------------------------------+
| 1    red   blue  green   dog   hat   cat        cat |
| 2   blue  green   blue   dog   hat   cat        hat |
| 3  green   blue   blue   dog   hat   cat        dog |
| 4  green    red   blue   dog   hat   cat        dog |
+-----------------------------------------------------+

但是,实际数据集超过10000行,因此我的解决方案效率非常低。任何人都可以建议更有效的替代方法吗?

提前致谢!

1 个答案:

答案 0 :(得分:1)

只需使用ifelse来对您的比较进行矢量化:

for (i in 1:3) {
  myDATA$greenImage = ifelse (myDATA[,i] == "green",
                              as.character(myDATA[,i+3]),
                              myDATA$greenImage)
}

请注意,需要as.character才能从factor中获取字符串。如果您在创建stringsAsFactors = FALSE时使用data.frame,则可以避免这种情况。