假设我们有一个如下所示的数据框:
UNIT NUMBER Year City STATE
124 1996 Prague CZECH
121 2001 Sofie BULG
122 2003 Ostrava CZECH
147 1986 Kyjev UKRAINE
133 2005 Lvov UKRAINE
...
...
...
188 2001 Rome ITALY
并说我需要在数据框中添加一个称为 Capital city 的变量-如果 City 是 STATE < / strong>,否则为0。
如何添加此变量? 上面数据框中的首都城市是:布拉格,索菲,基耶夫
PS:我知道我可以在上述数据框中“手动”完成操作,但是我需要通用的解决方案来处理更大的数据框...
答案 0 :(得分:0)
如果您有许多城市名称,而有些城市具有相同名称:
library(dplyr)
df <- data.frame(
unit = c(124, 121, 122, 147, 133),
Year = c(1996,2001,2003,1986,2005),
City = c("Prague", "Sofie", "Ostrava", "Kyjev", "Lvov"),
State = c("CZECH", "BULG", "CZECH", "UKRAINE", "UKRAINE"))
capital <- data.frame(
City = c("Prague", "Sofie", "Kyjev"),
State = c("CZECH", "BULG", "UKRAINE"),
Capital = "YES"
)
left_join(df, capital, by = c("State" = "State", "City" = "City"))
获取:
> left_join(df, capital, by = c("State" = "State", "City" = "City"))
unit Year City State Capital
1 124 1996 Prague CZECH YES
2 121 2001 Sofie BULG YES
3 122 2003 Ostrava CZECH <NA>
4 147 1986 Kyjev UKRAINE YES
5 133 2005 Lvov UKRAINE <NA>
如果所有城市名称都是唯一的,那么
cap_list = c("Prague", "Sofie", "Kyjev")
df %>%
mutate (
yes = as.numeric(City %in% cap_list)
)
unit Year City State yes
1 124 1996 Prague CZECH 1
2 121 2001 Sofie BULG 1
3 122 2003 Ostrava CZECH 0
4 147 1986 Kyjev UKRAINE 1
5 133 2005 Lvov UKRAINE 0