Question

我还在学习如何将SAS代码翻译成R，然后收到警告。我需要了解我犯错误的地方。我想要做的是创建一个变量来总结和区分人口的3种状态：大陆，海外，外国人。我有一个包含2个变量的数据库：

id国籍：idnat（法国人，外国人），

如果idnat是法国人，那么：

id birthplace：idbp（大陆，殖民地，海外）

我想将idnat和idbp中的信息汇总成一个名为idnat2的新变量：

状态：k（大陆，海外，外国人）

所有这些变量都使用“字符类型”。

列idnat2中的预期结果：

   idnat     idbp   idnat2
1  french mainland mainland
2  french   colony overseas
3  french overseas overseas
4 foreign  foreign  foreign

这是我要在R中翻译的SAS代码：

if idnat = "french" then do;
   if idbp in ("overseas","colony") then idnat2 = "overseas";
   else idnat2 = "mainland";
end;
else idnat2 = "foreigner";
run;

这是我在R中的尝试：

if(idnat=="french"){
    idnat2 <- "mainland"
} else if(idbp=="overseas"|idbp=="colony"){
    idnat2 <- "overseas"
} else {
    idnat2 <- "foreigner"
}

我收到此警告：

Warning message:
In if (idnat=="french") { :
  the condition has length > 1 and only the first element will be used

我被建议使用“嵌套ifelse”代替其容易，但会收到更多警告：

idnat2 <- ifelse (idnat=="french", "mainland",
        ifelse (idbp=="overseas"|idbp=="colony", "overseas")
      )
            else (idnat2 <- "foreigner")

根据警告信息，长度大于1，因此只考虑第一个括号之间的长度。对不起，但我不明白这个长度与这里有什么关系？谁知道我哪里错了？

Answer 1

如果您使用的是任何电子表格应用程序，则基本函数if()的语法为：

if(<condition>, <yes>, <no>)

R中的ifelse()语法完全相同：

ifelse(<condition>, <yes>, <no>)

电子表格应用程序中if()的唯一区别是R ifelse()被向量化（将向量作为输入并在输出时返回向量）。考虑以下电子表格应用程序和R中的公式比较，我们希望比较一个＆gt; b如果是则返回1，否则返回0。

在电子表格中：

  A  B C
1 3  1 =if(A1 > B1, 1, 0)
2 2  2 =if(A2 > B2, 1, 0)
3 1  3 =if(A3 > B3, 1, 0)

在R：

> a <- 3:1; b <- 1:3
> ifelse(a > b, 1, 0)
[1] 1 0 0

ifelse()可以通过多种方式嵌套：

ifelse(<condition>, <yes>, ifelse(<condition>, <yes>, <no>))

ifelse(<condition>, ifelse(<condition>, <yes>, <no>), <no>)

ifelse(<condition>, 
       ifelse(<condition>, <yes>, <no>), 
       ifelse(<condition>, <yes>, <no>)
      )

ifelse(<condition>, <yes>, 
       ifelse(<condition>, <yes>, 
              ifelse(<condition>, <yes>, <no>)
             )
       )

要计算列idnat2，您可以：

df <- read.table(header=TRUE, text="
idnat idbp idnat2
french mainland mainland
french colony overseas
french overseas overseas
foreign foreign foreign"
)

with(df, 
     ifelse(idnat=="french",
       ifelse(idbp %in% c("overseas","colony"),"overseas","mainland"),"foreign")
     )

R Documentation

什么是the condition has length > 1 and only the first element will be used？我们来看看：

> # What is first condition really testing?
> with(df, idnat=="french")
[1]  TRUE  TRUE  TRUE FALSE
> # This is result of vectorized function - equality of all elements in idnat and 
> # string "french" is tested.
> # Vector of logical values is returned (has the same length as idnat)
> df$idnat2 <- with(df,
+   if(idnat=="french"){
+   idnat2 <- "xxx"
+   }
+   )
Warning message:
In if (idnat == "french") { :
  the condition has length > 1 and only the first element will be used
> # Note that the first element of comparison is TRUE and that's whay we get:
> df
    idnat     idbp idnat2
1  french mainland    xxx
2  french   colony    xxx
3  french overseas    xxx
4 foreign  foreign    xxx
> # There is really logic in it, you have to get used to it

我还可以使用if()吗？是的，你可以，但语法不是很酷：）

test <- function(x) {
  if(x=="french") {
    "french"
  } else{
    "not really french"
  }
}

apply(array(df[["idnat"]]),MARGIN=1, FUN=test)

如果您熟悉SQL，还可以在CASE statement中使用sqldf package。

Answer 2

尝试以下内容：

# some sample data
idnat <- sample(c("french","foreigner"),100,TRUE)
idbp <- rep(NA,100)
idbp[idnat=="french"] <- sample(c("mainland","overseas","colony"),sum(idnat=="french"),TRUE)

# recoding
out <- ifelse(idnat=="french" & !idbp %in% c("overseas","colony"), "mainland",
              ifelse(idbp %in% c("overseas","colony"),"overseas",
                     "foreigner"))
cbind(idnat,idbp,out) # check result

您的困惑来自于SAS和R如何处理if-else结构。在R中，if和else没有矢量化，这意味着它们会检查单个条件是否为真（即if("french"=="french")有效）并且无法处理多个逻辑（即if(c("french","foreigner")=="french")不起作用）R会给你你收到的警告。

相比之下，ifelse是矢量化的，所以它可以采用你的矢量（也就是输入变量）并测试每个元素的逻辑条件，就像你在SAS中习惯的那样。另一种包围你的方法是使用if和else语句构建一个循环（正如你在这里开始做的那样）但是矢量化ifelse方法会更多效率高，通常涉及的代码较少。

Answer 3

如果数据集包含许多行，则使用data.table而不是嵌套ifelse()加入查找表可能更有效。

提供下面的查找表

lookup

     idnat     idbp   idnat2
1:  french mainland mainland
2:  french   colony overseas
3:  french overseas overseas
4: foreign  foreign  foreign

和样本数据集

library(data.table)
n_row <- 10L
set.seed(1L)
DT <- data.table(idnat = "french",
                 idbp = sample(c("mainland", "colony", "overseas", "foreign"), n_row, replace = TRUE))
DT[idbp == "foreign", idnat := "foreign"][]

      idnat     idbp
 1:  french   colony
 2:  french   colony
 3:  french overseas
 4: foreign  foreign
 5:  french mainland
 6: foreign  foreign
 7: foreign  foreign
 8:  french overseas
 9:  french overseas
10:  french mainland

然后我们可以在加入时执行更新：

DT[lookup, on = .(idnat, idbp), idnat2 := i.idnat2][]

idnat idbp idnat2 1: french colony overseas 2: french colony overseas 3: french overseas overseas 4: foreign foreign foreign 5: french mainland mainland 6: foreign foreign foreign 7: foreign foreign foreign 8: french overseas overseas 9: french overseas overseas 10: french mainland mainland

Answer 4

您可以创建不含idnat2和if的向量ifelse。

函数replace可用于将所有"colony"替换为"overseas"：

idnat2 <- replace(idbp, idbp == "colony", "overseas")

Answer 5

将SQL CASE语句与dplyr和sqldf包一起使用：

数据

df <-structure(list(idnat = structure(c(2L, 2L, 2L, 1L), .Label = c("foreign", "french"), class = "factor"), idbp = structure(c(3L, 1L, 4L, 2L), .Label = c("colony", "foreign", "mainland", "overseas"), class = "factor")), .Names = c("idnat", "idbp"), class = "data.frame", row.names = c(NA, -4L))

<强> sqldf

library(sqldf) sqldf("SELECT idnat, idbp, CASE WHEN idbp IN ('colony', 'overseas') THEN 'overseas' ELSE idbp END AS idnat2 FROM df")

<强> dplyr

library(dplyr) df %>% mutate(idnat2 = case_when(.$idbp == 'mainland' ~ "mainland", .$idbp %in% c("colony", "overseas") ~ "overseas", TRUE ~ "foreign"))

<强>输出

idnat idbp idnat2 1 french mainland mainland 2 french colony overseas 3 french overseas overseas 4 foreign foreign foreign

Answer 6

使用data.table，解决方案是：

DT[, idnat2 := ifelse(idbp %in% "foreign", "foreign", 
        ifelse(idbp %in% c("colony", "overseas"), "overseas", "mainland" ))]

ifelse是矢量化的。 if-else不是。{1}}。这里，DT是：

    idnat     idbp
1  french mainland
2  french   colony
3  french overseas
4 foreign  foreign

这给出了：

   idnat     idbp   idnat2
1:  french mainland mainland
2:  french   colony overseas
3:  french overseas overseas
4: foreign  foreign  foreign

Answer 7

# Read in the data.

idnat=c("french","french","french","foreign")
idbp=c("mainland","colony","overseas","foreign")

# Initialize the new variable.

idnat2=as.character(vector())

# Logically evaluate "idnat" and "idbp" for each case, assigning the appropriate level to "idnat2".

for(i in 1:length(idnat)) {
  if(idnat[i] == "french" & idbp[i] == "mainland") {
    idnat2[i] = "mainland"
} else if (idnat[i] == "french" & (idbp[i] == "colony" | idbp[i] == "overseas")) {
  idnat2[i] = "overseas"
} else {
  idnat2[i] = "foreign"
} 
}

# Create a data frame with the two old variables and the new variable.

data.frame(idnat,idbp,idnat2)

Answer 8

使用示例进行解释是帮助我解决问题的关键，但是我遇到的问题是当我复制它时不起作用，因此我不得不以多种方式使其混乱以使其正常工作。（我是R的超级新手，由于缺乏知识，我对第三个ifelse遇到了一些问题。）

所以对于那些R新手来说会遇到问题...

   ifelse(x < -2,"pretty negative", ifelse(x < 1,"close to zero", ifelse(x < 3,"in [1, 3)","large")##all one line
     )#normal tab
)

（我在函数中使用了它，因此将“ ifelse ...”标记为一个，但是最后一个“）”完全在左侧）

Answer 9

很抱歉加入聚会太晚了。这是一个简单的解决方案。

#building up your initial table
idnat <- c(1,1,1,2) #1 is french, 2 is foreign

idbp <- c(1,2,3,4) #1 is mainland, 2 is colony, 3 is overseas, 4 is foreign

t <- cbind(idnat, idbp)

#the last column will be a vector of row length = row length of your matrix
idnat2 <- vector()

#.. and we will populate that vector with a cursor

for(i in 1:length(idnat))

     #*check that we selected the cursor to for the length of one of the vectors*

{  

  if (t[i,1] == 2) #*this says: if idnat = foreign, then it's foreign*

    {

      idnat2[i] <- 3 #3 is foreign

    }

  else if (t[i,2] == 1) #*this says: if not foreign and idbp = mainland then it's mainland*

    {

      idnat2[i] <- 2 # 2 is mainland  

    }

  else #*this says: anything else will be classified as colony or overseas*

    {

      idnat2[i] <- 1 # 1 is colony or overseas 

    }

}


cbind(t,idnat2)

嵌套if else语句

9 个答案: