zom $ country.code是int。
zom $ Country.Code< - c(1,14,30,37,94,148,162,166,184,189,191,208,214,215,216)
r <-c(India,Australia,Brazil,Canada,Indonesia,NewZealand,Phillipines,Qatar,Singapore,southAfrica,SriLanka,Turkey,UAE,UnitedKingdom,UnitedStates)
我想输出如下:
zom$Country.Code <- c(India,Australia,Brazil,Canada,Indonesia,NewZealand,Phillipines,Qatar,Singapore,southAfrica,SriLanka,Turkey,UAE,UnitedKingdom,UnitedStates)
如何在R中解决此问题。
答案 0 :(得分:1)
factor()
函数可用于将数字向量与一组标签相关联。例如:
x <- c(1,1,1,2,3,3,2,3,4,4)
theLabels <- c("India","Canada","United States","Mexico")
y <- factor(x,1:4,theLabels)
y
产生以下输出:
> y <- factor(x,1:4,theLabels)
> y
[1] India India India Canada United States
[6] United States Canada United States Mexico Mexico
级别:印度加拿大美国墨西哥
要证明此答案适用于OP的第五次编辑中提供的数据:
r <-c("India","Australia","Brazil","Canada","Indonesia","NewZealand",
"Phillipines","Qatar","Singapore","southAfrica","SriLanka","Turkey","UAE","UnitedKingdom","UnitedStates")
zom<- data.frame(Country.Code=c(1,14,30,37,94,148,162,166,184,189,191,208,214,215,216))
zom$Country.Code <- factor(zom$Country.Code,
levels = c(1,14,30,37,94,148,162,166,184,189,191,208,214,215,216),
labels = r)
zom$Country.Code
...和输出:
> zom$Country.Code
[1] India Australia Brazil Canada Indonesia NewZealand Phillipines Qatar
[9] Singapore southAfrica SriLanka Turkey UAE UnitedKingdom UnitedStates
15 Levels: India Australia Brazil Canada Indonesia NewZealand Phillipines Qatar Singapore southAfrica SriLanka Turkey ... UnitedStates
注意:一旦原始代码转换为因子,基础代码就会丢失,因为因子的副作用是因子级别成为从1到唯一标签数量的有序列表与因素相关联。
factor()
方法的替代方法是创建国家/地区名称和代码的查找表,并将其与原始数据合并。此方法保留Country.Code
的原始值。
为了说明,我们将从OP创建一个包含多行Country.Code
的数据框,并通过dplyr::inner_join()
将其与查找表合并。然后,我们会生成Country.Name
和Country.Code
的交叉表,以说明加入过程的准确性。
library(dplyr)
# first, build a data frame containg multiple rows with same country code
zom<- data.frame(Country.Code=c(1,14,30,37,94,148,162,166,184,189,191,208,214,215,216,
1,14,30,37,94,148,162,166,184,189,191,208,214,215,216,
1,14,30,37,94,148,162,166,184,189,191,208,214,215,216))
# second, create lookup table of codes and names, one row per country
countryNames <- data.frame(Country.Code=c(1,14,30,37,94,148,162,166,184,189,191,208,214,215,216),
Country.Name= c("India","Australia","Brazil","Canada","Indonesia","NewZealand",
"Phillipines","Qatar","Singapore","southAfrica","SriLanka","Turkey","UAE","UnitedKingdom","UnitedStates"),
stringsAsFactors=FALSE)
# use dplyr::inner_join() to join country names
mergedData <- zom %>% inner_join(countryNames)
table(mergedData$Country.Name,mergedData$Country.Code)
...和输出:
> table(mergedData$Country.Name,mergedData$Country.Code)
1 14 30 37 94 148 162 166 184 189 191 208 214 215 216
Australia 0 3 0 0 0 0 0 0 0 0 0 0 0 0 0
Brazil 0 0 3 0 0 0 0 0 0 0 0 0 0 0 0
Canada 0 0 0 3 0 0 0 0 0 0 0 0 0 0 0
India 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Indonesia 0 0 0 0 3 0 0 0 0 0 0 0 0 0 0
NewZealand 0 0 0 0 0 3 0 0 0 0 0 0 0 0 0
Phillipines 0 0 0 0 0 0 3 0 0 0 0 0 0 0 0
Qatar 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0
Singapore 0 0 0 0 0 0 0 0 3 0 0 0 0 0 0
southAfrica 0 0 0 0 0 0 0 0 0 3 0 0 0 0 0
SriLanka 0 0 0 0 0 0 0 0 0 0 3 0 0 0 0
Turkey 0 0 0 0 0 0 0 0 0 0 0 3 0 0 0
UAE 0 0 0 0 0 0 0 0 0 0 0 0 3 0 0
UnitedKingdom 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0
UnitedStates 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3
>