Question

首先，我读过一些类似的问题。我的问题与已经解决的问题非常相似。但微小的差异对我来说会带来一些问题。

在我的问题中，我有一列数据框，有五个不同级别的字符串：“10-20％”“100 +％”“21-40％”“41-70％”“71-100％” 。我已经尝试过两个函数，as.numeric和as.integer。这两个函数确实将字符串更改为数字响应。问题是我想通过遵循数字序列来转换这些字符串。例如，“10-20％”“100 +％”“21-40％”“41-70％”“71-100％”，每个字符串对应的字符串是1,2,3,4 5。

但我想要的是“10-20％”是1，“21-40％”是2，“41-70％”是3，“71-100％”是4和“100 +％ “是5。如果我想实现目标，是否必须手动更改这些字符串的级别？

附录：

levels(dataset$PercentGrowth)
[1] ""        "10-20%"  "100+%"   "21-40%"  "41-70%"  "71-100%"

head(as.integer(dataset$PercentGrowth))
[1] 1 4 3 1 3 4

head(as.numeric(dataset$PercentGrowth))
[1] 1 4 3 1 3 4

head((dataset$PercentGrowth))
[1]        21-40% 100+%         100+%  21-40%
Levels:  10-20% 100+% 21-40% 41-70% 71-100%

Answer 1

as.numeric(factor(df$string.var, 
    levels = c("10-20%", "21-40%", "41-70%", "71-100%",  "100+%"))
?factor

示例数据会有所帮助。

编辑添加级别。

Answer 2

你应该从你的字符串创建一个因子，按顺序分配水平：

x = c("10-20%", "100+%" ,"21-40%" ,"41-70%", "71-100%")
as.integer(factor(x,levels=x))

[1] 1 2 3 4 5

Answer 3

您可以尝试：

x <- c("10-20%", "100+%" ,"21-40%" ,"41-70%", "21-40%", "71-100%", "10-20%")
library(gtools)
match(x,unique(mixedsort(x)))
#[1] 1 5 2 3 2 4 1

##
as.numeric(factor(x, levels=unique(mixedsort(x))))
#[1] 1 5 2 3 2 4 1

假设您的载体是:(不是一般解决方案）

x1 <- c("less than one year", "one year", "more than one year","one year", "less than one year")

？gsub2（）来自R: replace characters using gsub, how to create a function?

gsub2 <- function(pattern, replacement, x, ...) {
for(i in 1:length(pattern))
x <- gsub(pattern[i], replacement[i], x, ...)
x
}

x1[mixedorder(gsub2(c("less","^one","more"), c(0,1,2), x1))]
[1] "less than one year" "less than one year" "one year"          
[4] "one year"           "more than one year"

如何在R中将不同级别的字符串转换为数字响应？

3 个答案: