我正在比较两列创建第三列...它不起作用?

时间:2015-04-07 20:30:19

标签: r dataframe

这绝对是一个新手问题,但我卡住了,找不到可比的在线帮助.. 我试图比较数据帧的两列来创建第三列。在这里,我想比较Distx和Disty。如果有任何值,我想保留它并将其放在新列Distz中。如果他们都是"失踪"我想把#34; Missing"在Distz。以下是我想要的数据框。

    ID <- c(1, 2, 3, 4, 5, 6)
    Distx <- c("A", "B", "Missing", "Missing", "G", "Missing")
    Disty <- c("Missing", "Missing", "C", "Missing", "Missing", "E")

    mydf <- data.frame(ID, Distx, Disty, Distz) 
    mydf

     ID   Distx   Disty   Distz
    1  1       A Missing       A
    2  2       B Missing       B
    3  3 Missing       C       C
    4  4 Missing Missing Missing
    5  5       G Missing       G
    6  6 Missing       E       E

这是不起作用的代码......起初我以为我没有正确编制索引,但是下面的第二次代码尝试产生了相同的结果..没有错误消息,但结果是1&# 39; s,而不是列的实际值....?

    for (i in seq(1:nrow(mydf))){
       if (mydf$Distx[i] == "Missing" && mydf$Disty[i] != "Missing"){
         mydf$Distz[i]<- mydf$Disty[i]}
       if (mydf$Distx[i] != "Missing" && mydf$Disty[i] == "Missing"){
        mydf$Distz[i]<- mydf$Distx[i]}
       if (mydf$Distx[i] == "Missing" && mydf$Disty[i] == "Missing"){
        mydf$Distz[i]<- "Missing"}
    }

    #for the purposes of readability I only ran two of the tests in this code
    within(mydf, {
      Distz <- ifelse(Distx == "Missing" & Disty != "Missing", Disty,          ifelse(Distx != "Missing" & Disty == "Missing", Distx))
    })

    #Both results look like this ...???

      ID   Distx   Disty Distz
    1  1       A Missing     1
    2  2       B Missing     1
    3  3 Missing       C     1
    4  4 Missing Missing     1
    5  5       G Missing     1
    6  6 Missing       E     1

提前感谢您提供任何帮助

2 个答案:

答案 0 :(得分:1)

您可以尝试嵌套的ifelse语句:

mydf$Distz <- with(mydf, ifelse(Distx == "Missing" & Disty == "Missing", "Missing", 
                           ifelse(Distx != "Missing", as.character(Distx), 
                             ifelse(Disty != "Missing", as.character(Disty), NA))))
mydf
#   ID   Distx   Disty   Distz
# 1  1       A Missing       A
# 2  2       B Missing       B
# 3  3 Missing       C       C
# 4  4 Missing Missing Missing
# 5  5       G Missing       G
# 6  6 Missing       E       E

您遇到的代码问题是您的变量是&#34; factor&#34;上课,不是&#34;字符&#34;类,所以代码记录了因素&#34; level&#34;而不是因素标签。上面通过使用as.character()来强制要素来解决这个问题。

答案 1 :(得分:1)

您也可以

 indx <- mydf[-1]!='Missing'
 mydf$Distz <- mydf[-1][cbind(1:nrow(mydf), max.col(indx))]
 mydf
 #  ID   Distx   Disty   Distz
 #1  1       A Missing       A
 #2  2       B Missing       B
 #3  3 Missing       C       C
 #4  4 Missing Missing Missing
 #5  5       G Missing       G
 #6  6 Missing       E       E

注意:我使用的列是&#39;字符&#39;类。您可以创建&#39; data.frame&#39;使用stringsAsFactors=FALSE以便&#39;字符&#39;列不会转换为&#39;因子&#39;类。最好与“角色”一起工作。而不是&#39;因素&#39;

数据

mydf <-  structure(list(ID = c(1, 2, 3, 4, 5, 6), Distx = c("A", "B", 
"Missing", "Missing", "G", "Missing"), Disty = c("Missing", "Missing", 
"C", "Missing", "Missing", "E")), .Names = c("ID", "Distx", "Disty"
), row.names = c(NA, -6L), class = "data.frame")