Question

我正在使用来自世界发展指标（WDI）的数据，并希望将此数据与其他一些数据合并。我的问题是两个数据集中的国家/地区名称的拼写是不同的。如何更改国家/地区变量？

library('WDI')
df <- WDI(country="all", indicator= c("NY.GDP.MKTP.CD", "EN.ATM.CO2E.KD.GD", 'SE.TER.ENRR'), start=1998, end=2011, extra=FALSE)

head(df)
      country iso2c year NY.GDP.MKTP.CD EN.ATM.CO2E.KD.GD SE.TER.ENRR
99  ArabWorld    1A 1998   575369488074          1.365953          NA
100 ArabWorld    1A 1999   627550544566          1.355583    19.54259
101 ArabWorld    1A 2000   723111925659          1.476619          NA
102 ArabWorld    1A 2001   703688747656          1.412750          NA
103 ArabWorld    1A 2002   713021728054          1.413733          NA
104 ArabWorld    1A 2003   803017236111          1.469197          NA

我如何将ArabWorld改为阿拉伯世界？

我需要更改很多名称，所以使用row.numbers这样做不会给我足够的灵活性。我想要的东西类似于Stata中的replace函数。

Answer 1

这适用于角色或因素。

df$country <- sub("ArabWorld", "Arab World", df$country)

这相当于：

> df[,1] <- sub("ArabWorld", "Arab World", df[,1] )
> head(df)
       country iso2c year NY.GDP.MKTP.CD EN.ATM.CO2E.KD.GD
99  Arab World    1A 1998   575369488074          1.365953
100 Arab World    1A 1999   627550544566          1.355583
101 Arab World    1A 2000   723111925659          1.476619
102 Arab World    1A 2001   703688747656          1.412750

如果您创建具有所需更改的数据框，则可以循环更改它们。请注意，我已更新此内容，以便显示如何在该列中输入括号，以便将它们正确传递到sub：

name.cng <- data.frame(orig = c("AntiguaandBarbuda", "AmericanSamoa", 
                                    "EastAsia&Pacific\\(developingonly\\)",
                                    "Europe&CentralAsia\\(developingonly\\)", 
                                    "UnitedArabEmirates"), 
                           spaced=c("Antigua and Barbuda", "American Samoa",
                                    "East Asia & Pacific (developing only)",
                                     "Europe&CentralAsia (developing only)", 
                                      "United Arab Emirates") )
for (i in 1:NROW(name.cng)){ 
      df$country <- sub(name.cng[i,1], name.cng[i,2], df$country) }

Answer 2

最简单的，特别是如果要更改许多名称，可能是将您的对应表放在data.frame中，并使用merge命令将其与数据连接起来。例如，如果您想更改韩国的名称：

# Correspondance table
countries <- data.frame(
  iso2c = c("KR", "KP"),
  country = c("South Korea", "North Korea")
)

# Join the data.frames
d <- merge( df, countries, by="iso2c", all.x=TRUE )
# Compute the new country name
d$country <- ifelse(is.na(d$country.y), as.character(d$country.x), as.character(d$country.y))
# Remove the columns we no longer need
d <- d[, setdiff(names(d), c("country.x", "country.y"))]

# Check that the result looks correct
head(d)
head(d[ d$iso2c %in% c("KR", "KP"), ])

但是，将国家ISO代码加入两个数据集可能会更安全，而国家ISO代码比国家名称更为标准。

Answer 3

使用子集：

df[df[, "country"] == "ArabWorld", "country"] <- "Arab World"

head(df)
   country iso2c year NY.GDP.MKTP.CD EN.ATM.CO2E.KD.GD SE.TER.ENRR
99  Arab World    1A 1998   575369488074          1.365953          NA
100 Arab World    1A 1999   627550544566          1.355583    19.54259
101 Arab World    1A 2000   723111925659          1.476619          NA
102 Arab World    1A 2001   703688747656          1.412750          NA
103 Arab World    1A 2002   713021728054          1.413733          NA
104 Arab World    1A 2003   803017236111          1.469197          NA

如何更改data.frame中列的内容

3 个答案: