我有一个数据框(df),列出与每个网站相关联的国家/地区
Site Country
Site1 USA
Site2 Vietnam
Site3 Spain
Site4 Germany
Site5 China
我想附上一个专栏,我将每个国家与相应的大陆联系起来。我写了一个简单的if loop
来做到这一点:
df$Continent <- NA
if(df$Country == "USA" |df$Country == "Canada" |df$Country == "Mexico")
{df$Continent <- "North America"}
if(df$Country == "Spain" |df$Country == "France" |df$Country == "Germany")
{df$Continent <- "Europe"}
## .. etc
summary(df)
然而,每次我运行df时,我发现它将北美分配给所有国家。我理解这可能听起来微不足道,但如果我在所有地方使用if
个数据而不是else
或if else
,这会有所不同吗?有任何纠正这个的建议吗?
答案 0 :(得分:5)
构建一个查找表,并使用数据merge()
。
例如:
lookup <- data.frame(Country = c("USA", "Canada", "Mexico",
"Spain", "France", "Germany",
"Vietnam", "China"),
Continent = rep(c("North America", "Europe", "Asia"),
times = c(3,3,2)))
将您的数据片段用作数据框df
,我们可以通过Continent
添加merge()
(数据库术语的连接):
> merge(df, lookup, sort = FALSE, all.x = TRUE)
Country Site Continent
1 USA Site1 North America
2 Vietnam Site2 Asia
3 Spain Site3 Europe
4 Germany Site4 Europe
5 China Site5 Asia
答案 1 :(得分:3)
如果您正在使用factor
,那么您也可以使用levels
或levels<-
做一些废话:
`levels<-`(dat$Country, list(
`North America` = c("USA","Canada","Mexico"),
`Europe` = c("Spain","France","Germany"),
`Asia` = c("Vietnam","China")
))
#[1] North America Asia Europe Europe Asia
#Levels: North America Europe Asia
答案 2 :(得分:1)
我喜欢ifelse()
这样的事情。您可以像%in%
运算符一样使用它:
df$Continent <- ifelse(df$Country %in% c("USA", "Canada", "Mexico"),
"North America", df$Continent)
df$Continent <- ifelse(df$Country %in% c("Spain", "France", "Germany"),
"Europe", df$Continent)
df
Site Country Continent
1 Site1 USA North America
2 Site2 Vietnam <NA>
3 Site3 Spain Europe
4 Site4 Germany Europe
5 Site5 China <NA>