从R

时间:2017-11-27 11:40:38

标签: r

我有一个数据框,其中一列代表国家/地区名称。我的目标是添加一个列,提供大陆信息。请检查以下用例:

my.df <- data.frame(country = c("Afghanistan","Algeria"))

是否有一个软件包可用于附加包含大陆名称的数据列,而不包含原始数据?

3 个答案:

答案 0 :(得分:13)

您可以使用countrycode包执行此任务。

library(countrycode)
df <- data.frame(country = c("Afghanistan",
                             "Algeria",
                             "USA",
                             "France",
                             "New Zealand",
                             "Fantasyland"))

df$continent <- countrycode(sourcevar = df[, "country"],
                            origin = "country.name",
                            destination = "continent")
#warning
#In countrycode(sourcevar = df[, "country"], origin = "country.name",  :
#  Some values were not matched unambiguously: Fantasyland

结果

df
#      country continent
#1 Afghanistan      Asia
#2     Algeria    Africa
#3         USA  Americas
#4      France    Europe
#5 New Zealand   Oceania
#6 Fantasyland      <NA>

答案 1 :(得分:2)

你可以尝试

my.df <- data.frame(country = c("Afghanistan","Algeria"),
                    continent= as.factor(c("Asia","Africa")))
merge(my.df, raster::ccodes()[,c("NAME", "CONTINENT")], by.x="country", by.y="NAME", all.x=T)
#       country continent CONTINENT
# 1 Afghanistan      Asia      Asia
# 2     Algeria    Africa    Africa

某些country值可能需要调整;我不知道,因为你没有提供所有的价值。

答案 2 :(得分:0)

根据Markus的答案,countrycode借鉴了codelist的“大陆”声明。

?codelist

continent的定义:

  

大陆:世界银行发展指标中定义的大陆

该问题询问大洲,但有时大洲没有提供足够的组来描绘数据。例如,continents将北美和南美分为Americas

您可能想要的是region

  

地区:世界银行发展指标中定义的地区

尚不清楚世界银行如何对区域进行分组,但以下代码显示了该目的地的粒度。

library(countrycode)

egnations <- c("Afghanistan","Algeria","USA","France","New Zealand","Fantasyland")

countrycode(sourcevar = egnations, origin = "country.name",destination = "region")

输出:

[1] "Southern Asia"            
[2] "Northern Africa"          
[3] "Northern America"         
[4] "Western Europe"           
[5] "Australia and New Zealand"
[6] NA