基于行值的新列 - 更好的方法?

时间:2018-06-07 16:17:59

标签: r if-statement dataframe

确实有一种更好的方法来创建与“目标”相匹配的列。列?

我在Stack上搜索答案,但似乎没有人需要知道如何做到这一点。也许是从一个完全愚蠢的角度来看(我的头脑可能已经处于Stata模式,因为那是我老板的想法,他让我创建了这个新的'变量)。

A       <-c("bears",  "bears",     "na",   "pandas",     "pandas",    "bears",   "pandas")
B       <-c("bears",  "pandas",     "na",   "bears",     "na",          "bears",   "pandas")
target  <-c("bears", "the_zoo",   "na",   "the_zoo",  "pandas",   "bears",   "pandas")
df_test <-data.frame(A,B,target,  stringsAsFactors =FALSE)

class(df_test$B)
for(i in 1:nrow(df_test)){
                          # Case: 1: Both are equal
    df_test$output[i] <- ifelse(df_test$A[i] == df_test$B[i],
                               yes = as.character(df_test$A[i]), 
                               # Case 2: A contains NA
                                no = ifelse(df_test$A[i] == "na",
                                            yes = as.character(df_test$B[i]),
                                            # Case 2.2: B contains NA
                                            no = ifelse(df_test$B[i] =="na",
                                                        yes = as.character(df_test$A[i]),
                                                        # Case 3: All other possibilities are "the_zoo"
                                                        no = "the_zoo"
                                                        )))
                                                    }
df_test



> df_test
       A      B  target  output
1  bears  bears   bears   bears
2  bears pandas the_zoo the_zoo
3     na     na      na      na
4 pandas  bears the_zoo the_zoo
5 pandas     na  pandas  pandas
6  bears  bears   bears   bears
7 pandas pandas  pandas  pandas

2 个答案:

答案 0 :(得分:3)

有什么问题
A       <-c("bears",  "bears",     "na",   "pandas",     "pandas",    "bears",   "pandas")
B       <-c("bears",  "pandas",     "na",   "bears",     "na",          "bears",   "pandas")
target  <-c("bears", "the_zoo",   "na",   "the_zoo",  "pandas",   "bears",   "pandas")
df_test <-data.frame(A,B,target,  stringsAsFactors =FALSE)

df_test$test <- with(df_test, ifelse(A == B, A, 
                       ifelse(A == "na",B, 
                              ifelse(B == "na", A, "the_zoo"))))


print(df_test)

产生:

       A      B  target    test
1  bears  bears   bears   bears
2  bears pandas the_zoo the_zoo
3     na     na      na      na
4 pandas  bears the_zoo the_zoo
5 pandas     na  pandas  pandas
6  bears  bears   bears   bears
7 pandas pandas  pandas  pandas

你不需要for循环,因为ifelse已经被矢量化了。

答案 1 :(得分:0)

此处清理代码的一个选项是使用case_when包中的dplyr

library(dplyr)

df_test$output <-
case_when(
    df_test$A == df_test$B ~ as.character(df_test$A),
    df_test$A == "na" ~ as.character(df_test$B),
    df_test$B =="na" ~ as.character(df_test$A),
    TRUE ~ "the_zoo"
)

请注意,如果AB列已经是字符类型,那么您可能会假设代码的一部分,那么您可以删除上面对as.character的不必要的调用。