向数据框添加条件变量

时间:2019-10-12 19:35:39

标签: r dataframe

假设我们有一个如下所示的数据框:

UNIT NUMBER     Year     City     STATE
124             1996    Prague    CZECH
121             2001    Sofie     BULG
122             2003    Ostrava   CZECH
147             1986     Kyjev    UKRAINE
133             2005     Lvov     UKRAINE
...
...
...
188             2001     Rome      ITALY  

并说我需要在数据框中添加一个称为 Capital city 的变量-如果 City STATE < / strong>,否则为0。

如何添加此变量? 上面数据框中的首都城市是:布拉格,索菲,基耶夫

PS:我知道我可以在上述数据框中“手动”完成操作,但是我需要通用的解决方案来处理更大的数据框...

1 个答案:

答案 0 :(得分:0)

如果您有许多城市名称,而有些城市具有相同名称:

library(dplyr)
df <- data.frame(
  unit = c(124, 121, 122, 147, 133),
  Year = c(1996,2001,2003,1986,2005),
  City = c("Prague",  "Sofie", "Ostrava", "Kyjev", "Lvov"),
  State = c("CZECH", "BULG", "CZECH", "UKRAINE", "UKRAINE")) 

capital <- data.frame(
  City = c("Prague",  "Sofie",  "Kyjev"),
  State = c("CZECH", "BULG",  "UKRAINE"), 
  Capital = "YES"
  )  

left_join(df, capital, by = c("State" = "State", "City" = "City")) 

获取:

> left_join(df, capital, by = c("State" = "State", "City" = "City")) 
  unit Year    City   State Capital
1  124 1996  Prague   CZECH     YES
2  121 2001   Sofie    BULG     YES
3  122 2003 Ostrava   CZECH    <NA>
4  147 1986   Kyjev UKRAINE     YES
5  133 2005    Lvov UKRAINE    <NA>

如果所有城市名称都是唯一的,那么

cap_list = c("Prague",  "Sofie",  "Kyjev")
df %>%  
  mutate (
    yes = as.numeric(City %in% cap_list)
  )

  unit Year    City   State yes
1  124 1996  Prague   CZECH   1
2  121 2001   Sofie    BULG   1
3  122 2003 Ostrava   CZECH   0
4  147 1986   Kyjev UKRAINE   1
5  133 2005    Lvov UKRAINE   0