Question

如果location_1表示“埃塞俄比亚”，并且年份对应于直到1992年（含）的所有年份，并且以“埃塞俄比亚（1993-）”，如果location_1说“埃塞俄比亚”，则年份对应于从1993年开始的所有年份。

不幸的是，即使在1992年之后的那些年，我想出的代码也都用“埃塞俄比亚（-1992）”代替。

以下是代码：

location_1

我希望我可以根据年份将所有“埃塞俄比亚”变成“埃塞俄比亚（-1992）”或“埃塞俄比亚（1993-）”。相反，结果是所有“埃塞俄比亚”都变成了“埃塞俄比亚（-1992）”。

Answer 1

您可以替换数据子集中的列：

mydata[which(mydata$location_1=="Ethiopia" & mydata$year <= 1992), 
      "location1"] <- "Ethiopia (-1992)"

mydata[which(mydata$location_1=="Ethiopia" & mydata$year >  1992), 
       "location1"] <- "Ethiopia (1993-)"

或使用dplyr：

library(dplyr)
df1 %>% 
  mutate(location_1=case_when(location_1=="Ethiopia" & year <= 1992 ~ "Ethiopia (-1992)",
                              location_1=="Ethiopia" & year > 1992 ~ "Ethiopia (1993-)",
                              TRUE ~ location_1))

Answer 2

一种data.table方法。 data.table是一个非常快速的软件包，请检查?data.table以获得详细信息：

mydata[location_1 == "Ethiopia" & !is.na(year), 
       location1 := ifelse(year <= 1992, 
                           "Ethiopia (-1992)", 
                           "Ethiopia (1993-)")

其中有什么：

mydata[location_1 == "Ethiopia" & !is.na(year),过滤location_1为埃塞俄比亚且有年份的所有行（我们不想错误地为不可用的年份分配名称）。

location1 :=是一个分配电话（:=是分配操作员）

ifelse(year <= 1992, x, y)如果条件为TRUE，则返回x，否则返回y。

Answer 3

您正在使用的if-else条件类型应处于迭代循环中。 for循环，例如：

for (i in 1:nrow(mydata)){
    if (mydata$location_1[i] == "Ethiopia") {
        if (mydata$year[i] <= 1992) mydata$location_1[i] <- "Ethiopia (-1992)"
        else mydata$location_1[i] <- "Ethiopia (1993-)"
    }
}

#### OUTPUT ####

   year       location_1
1  1994          Germany
2  1998          Germany
3  1993 Ethiopia (1993-)
4  1982          Germany
5  1989            China
6  1997 Ethiopia (1993-)
7  2001            China
8  1990            China
9  1984 Ethiopia (-1992)
10 1999 Ethiopia (1993-)

您可以使用向量化函数ifelse更加紧凑（也许更快一点）实现同一目标：

mydata$location_1 <- ifelse(mydata$location_1 == "Ethiopia",
       ifelse(mydata$year <= 1992, "Ethiopia (-1992)", "Ethiopia (1993-)"),
       mydata$location_1
       )

就个人而言，我可能会创建一个新的变量，其国家名称后跟(-1992)或(1993-)。它在语法上紧凑，相对较快，并且保留了所有信息，这对于以后的子集很有用：

mydata$cy <- paste(mydata$location_1, ifelse(mydata$year <= 1992,
                                             "(-1992)", "(1993-)"
                                             ))

#### OUTPUT ####

   year location_1               cy
1  1994    Germany  Germany (1993-)
2  1998    Germany  Germany (1993-)
3  1993   Ethiopia Ethiopia (1993-)
4  1982    Germany  Germany (-1992)
5  1989      China    China (-1992)
6  1997   Ethiopia Ethiopia (1993-)
7  2001      China    China (1993-)
8  1990      China    China (-1992)
9  1984   Ethiopia Ethiopia (-1992)
10 1999   Ethiopia Ethiopia (1993-)

数据：

set.seed(123)

mydata <- data.frame(year = sample(1980:2004, 10, T),
                     location_1 = sample(c("Ethiopia", "Germany", "China"), 10, T),
                     stringsAsFactors = F
                     )

如何根据年份将“埃塞俄比亚”替换为“埃塞俄比亚（-1992）”和“埃塞俄比亚（1993-）”

3 个答案:

其中有什么：

数据：