R

时间:2015-11-09 17:15:01

标签: r variables aggregate factors

我有这个带有变量V21的data.frame,其中记录了许多国家,我想通过仅指定大陆而不是所有这些国家来缩小它。例如'古巴','秘鲁','阿根廷',而不是V21的单独级别,我希望它们成为“南美洲”的水平。这是我尝试使用的代码:

recode(WaveOne.test$V21, "levels("Cuba","Colombia","Costa Rica","Argentina","Chile","Ecuador","Peru","Venezuela")= 'South America'")

levels(V21)

你能说出我的代码有什么问题,或者可能是另一种方法吗? 我是R及其语法的完整新手。 谢谢!

====== UPDATE ======

SA_countries <- c("Cuba", "Mexico", "Argentina","Jamaica", "Haiti","West Indies", "Chile", "Ecuador", "Venezuela", "Other South America", "El Salvador", "Guatemala", "Nicaragua", "Dominican Republic", "Panama", "Costa Rica", "Peru")

Asia_countries&lt; - c(“菲律宾”,“越南”,“老挝”,“柬埔寨”,“苗族”,“其他亚洲”,“中国”,“香港”,“台湾”,“日本” ,“韩国”,“印度”,“巴基斯坦”) Europe_Canada&lt; - c(“欧洲/加拿大”) MiddleEast_Africa&lt; - c(“中东/非洲”)

continents <- list(`South America`= SA_countries, `Asia` = Asia_countries, `Europe_Canada` = Europe_Canada, `Middle East & Africa` = MiddleEast_Africa)
levels(WaveOne.test$V21) <- c(levels(WaveOne.test$V21), names(continents))
for(i in seq_along(continents)) WaveOne.test$V21[WaveOne.test$V21 %in%        continents[[i]]] <- names(continents)[i]

levels(WaveOne.test$V21)

我的输出是:

  

水平(WaveOne.test $ V21)

1“古巴”“墨西哥”“尼加拉瓜”“哥伦比亚”“多米尼加共和国”“萨尔瓦多”“危地马拉”
 [8]“洪都拉斯”“哥斯达黎加”“巴拿马”“阿根廷”“智利”“厄瓜多尔”“秘鲁”
[15]“委内瑞拉”“其他南美洲”“海地”“牙买加”“西印度群岛”“菲律宾”“越南”
[22]“老挝”“柬埔寨”“苗族”“其他亚洲”“中国”“香港”“台湾”
[29]“日本”“韩国”“印度”“巴基斯坦”“中东/非洲”“欧洲/加拿大”“南美洲”
[36]“亚洲”“欧洲_加拿大”“中东和非洲”

1 个答案:

答案 0 :(得分:1)

您可以创建包含所有国家/地区和大陆的列表,然后相应地重新分配值:

continents <- list(`South America`=SA_countries, 
                   `North America` = NA_countries, 
                    Europe=Euro_countries)
levels(df$V21) <- c(levels(df$V21), names(continents)) #necessary to add new levels
for(i in seq_along(continents)) {
df$V21[df$V21 %in% continents[[i]]] <- names(continents)[i]}

可重复的示例

set.seed(123)
SA_countries <- c("Cuba","Colombia","Costa Rica","Argentina","Chile","Ecuador","Peru","Venezuela")
NA_countries <- c("Mexico", "USA", "Canada")
Euro_countries <- c("Germany", "France")
df <- data.frame(V21=sample(c(NA_countries,SA_countries, Europe),20,T))
df
#           V21
# 1        Cuba
# 2   Venezuela
# 3  Costa Rica
# 4     Germany
# 5      France
# 6      Mexico
# 7   Argentina
# 8     Germany
# 9       Chile
# 10 Costa Rica
# 11     France
# 12 Costa Rica
# 13    Ecuador
# 14      Chile
# 15        USA
# 16    Germany
# 17       Cuba
# 18     Mexico
# 19   Colombia
# 20     France

continents <- list(`South America`=SA_countries, `North America` = NA_countries, Europe=Euro_countries)
levels(df$V21) <- c(levels(df$V21), names(continents))
for(i in seq_along(continents)) df$V21[df$V21 %in% continents[[i]]] <- names(continents)[i]
df
#              V21
# 1  South America
# 2  South America
# 3  South America
# 4         Europe
# 5         Europe
# 6  North America
# 7  South America
# 8         Europe
# 9  South America
# 10 South America
# 11        Europe
# 12 South America
# 13 South America
# 14 South America
# 15 North America
# 16        Europe
# 17 South America
# 18 North America
# 19 South America
# 20        Europe