我想将字符串值分配给R中的数字

时间:2018-04-21 01:23:15

标签: r dplyr plyr

zom $ country.code是int。
zom $ Country.Code< - c(1,14,30,37,94,148,162,166,184,189,191,208,214,215,216)

r <-c(India,Australia,Brazil,Canada,Indonesia,NewZealand,Phillipines,Qatar,Singapore,southAfrica,SriLanka,Turkey,UAE,UnitedKingdom,UnitedStates)

我想输出如下:

zom$Country.Code <- c(India,Australia,Brazil,Canada,Indonesia,NewZealand,Phillipines,Qatar,Singapore,southAfrica,SriLanka,Turkey,UAE,UnitedKingdom,UnitedStates)

如何在R中解决此问题。

1 个答案:

答案 0 :(得分:1)

factor()函数可用于将数字向量与一组标签相关联。例如:

x <- c(1,1,1,2,3,3,2,3,4,4)

theLabels <- c("India","Canada","United States","Mexico")

y <- factor(x,1:4,theLabels)
y

产生以下输出:

> y <- factor(x,1:4,theLabels)
> y
 [1] India         India         India         Canada        United States
 [6] United States Canada        United States Mexico        Mexico       

级别:印度加拿大美国墨西哥

要证明此答案适用于OP的第五次编辑中提供的数据:

r <-c("India","Australia","Brazil","Canada","Indonesia","NewZealand",
      "Phillipines","Qatar","Singapore","southAfrica","SriLanka","Turkey","UAE","UnitedKingdom","UnitedStates")
zom<- data.frame(Country.Code=c(1,14,30,37,94,148,162,166,184,189,191,208,214,215,216))

zom$Country.Code <- factor(zom$Country.Code,
                           levels = c(1,14,30,37,94,148,162,166,184,189,191,208,214,215,216),
                           labels = r)

zom$Country.Code

...和输出:

> zom$Country.Code
 [1] India         Australia     Brazil        Canada        Indonesia     NewZealand    Phillipines   Qatar        
 [9] Singapore     southAfrica   SriLanka      Turkey        UAE           UnitedKingdom UnitedStates 
15 Levels: India Australia Brazil Canada Indonesia NewZealand Phillipines Qatar Singapore southAfrica SriLanka Turkey ... UnitedStates

注意:一旦原始代码转换为因子,基础代码就会丢失,因为因子的副作用是因子级别成为从1到唯一标签数量的有序列表与因素相关联。

factor()方法的替代方法是创建国家/地区名称和代码的查找表,并将其与原始数据合并。此方法保留Country.Code的原始值。

为了说明,我们将从OP创建一个包含多行Country.Code的数据框,并通过dplyr::inner_join()将其与查找表合并。然后,我们会生成Country.NameCountry.Code的交叉表,以说明加入过程的准确性。

library(dplyr)
# first, build a data frame containg multiple rows with same country code
zom<- data.frame(Country.Code=c(1,14,30,37,94,148,162,166,184,189,191,208,214,215,216,
                                1,14,30,37,94,148,162,166,184,189,191,208,214,215,216,
                                1,14,30,37,94,148,162,166,184,189,191,208,214,215,216))
# second, create lookup table of codes and names, one row per country
countryNames <- data.frame(Country.Code=c(1,14,30,37,94,148,162,166,184,189,191,208,214,215,216),
                           Country.Name= c("India","Australia","Brazil","Canada","Indonesia","NewZealand",
                                           "Phillipines","Qatar","Singapore","southAfrica","SriLanka","Turkey","UAE","UnitedKingdom","UnitedStates"),
     stringsAsFactors=FALSE)

# use dplyr::inner_join() to join country names 
mergedData <- zom %>% inner_join(countryNames)
table(mergedData$Country.Name,mergedData$Country.Code)

...和输出:

> table(mergedData$Country.Name,mergedData$Country.Code)

                1 14 30 37 94 148 162 166 184 189 191 208 214 215 216
  Australia     0  3  0  0  0   0   0   0   0   0   0   0   0   0   0
  Brazil        0  0  3  0  0   0   0   0   0   0   0   0   0   0   0
  Canada        0  0  0  3  0   0   0   0   0   0   0   0   0   0   0
  India         3  0  0  0  0   0   0   0   0   0   0   0   0   0   0
  Indonesia     0  0  0  0  3   0   0   0   0   0   0   0   0   0   0
  NewZealand    0  0  0  0  0   3   0   0   0   0   0   0   0   0   0
  Phillipines   0  0  0  0  0   0   3   0   0   0   0   0   0   0   0
  Qatar         0  0  0  0  0   0   0   3   0   0   0   0   0   0   0
  Singapore     0  0  0  0  0   0   0   0   3   0   0   0   0   0   0
  southAfrica   0  0  0  0  0   0   0   0   0   3   0   0   0   0   0
  SriLanka      0  0  0  0  0   0   0   0   0   0   3   0   0   0   0
  Turkey        0  0  0  0  0   0   0   0   0   0   0   3   0   0   0
  UAE           0  0  0  0  0   0   0   0   0   0   0   0   3   0   0
  UnitedKingdom 0  0  0  0  0   0   0   0   0   0   0   0   0   3   0
  UnitedStates  0  0  0  0  0   0   0   0   0   0   0   0   0   0   3
>