Question

我有以下数据集。

dat2 <- read.table(header=TRUE, text="
ID  De  Ep  Ti  ID1
1123    113 121 100 11231
                   1123 105 107 110 11232
                   1134 122 111 107 11241
                   1134 117 120 111 11242
                   1154 122 116 109 11243
                   1165 108 111 118 11251
                   1175 106 115 113 11252
                   1185 113 104 108 11253
                   1226 109 119 116 11261
                   ")
dat2
  ID De  Ep  Ti   ID1
1  1  2 121 100 11231
2  1  1 107 110 11232
3  2  3 111 107 11241
4  2  2 120 111 11242
5  2  3 116 109 11243
6  3  1 111 118 11251
7  3  1 115 113 11252
8  4  2 104 108 11253
9  4  1 119 116 11261

我想更改前两列要更改，如下面的数字标签。但它将它们变成factor。

dat2$ID <- cut(dat2$ID, breaks=c(0,1124,1154,1184,Inf), 
               labels=c(5, 25, 55, 75))
table(dat2$ID)
 5 25 55 75 
 2  3  2  2 


dat2$De <- cut(dat2$De, breaks=c(0,110,118,125,Inf), 
               labels=c(10, 20, 30, 40))
table(dat2$De)
10 20 30 40 
 4  3  2  0 


str(dat2)
'data.frame':   9 obs. of  5 variables:
 $ ID : Factor w/ 4 levels "5","25","55",..: 1 1 2 2 2 3 3 4 4
 $ De : Factor w/ 4 levels "10","20","30",..: 2 1 3 2 3 1 1 2 1
 $ Ep : int  121 107 111 120 116 111 115 104 119
 $ Ti : int  100 110 107 111 109 118 113 108 116
 $ ID1: int  11231 11232 11241 11242 11243 11251 11252 11253 11261

我使用as.numeric将它们转换回数字，最终创建了我不想要的新标签（如1, 2, 3）。我需要一行简单的代码来轻松转换它。

dat2$ID <- as.numeric(dat2$ID)
table(dat2$ID)
1 2 3 4 
2 3 2 2 

dat2$De <- as.numeric(dat2$De)
table(dat2$De)
1 2 3 
4 3 2

Answer 1

在您的情况下，直接使用findInterval代替将数字转换为因子，然后返回数字值可能会更有效，如图所示here

c(5, 25, 55, 75)[findInterval(dat2$ID, c(0, 1124, 1154, 1184, Inf))]
## [1]  5  5 25 25 55 55 55 75 75

或（根据第二栏）

c(10, 20, 30, 40)[findInterval(dat2$De, c(0, 110, 118, 125, Inf))]
## [1] 20 10 30 20 30 10 10 20 10

这相当于使用cut但直接返回数值

cut(dat2$ID, breaks=c(0, 1124, 1154, 1184, Inf), labels=c(5, 25, 55, 75))
# [1] 5  5  25 25 25 55 55 75 75
# Levels: 5 25 55 75

这是一个快速基准，显示 ~X18速度提升

set.seed(123)
x <- sample(1e8, 1e7, replace = TRUE) 

system.time({
  res1 <- cut(x, breaks = c(0, 1e4, 1e5, 1e6, Inf), labels = c(5, 25, 55, 75))
  res1 <- as.numeric(levels(res1))[res1]
})
# user  system elapsed 
# 3.40    0.09    3.51 

system.time(res2 <- c(5, 25, 55, 75)[findInterval(x, c(0, 1e4, 1e5, 1e6, Inf))])
# user  system elapsed 
# 0.18    0.03    0.20 

identical(res1, res2)
## [1] TRUE

将标签从数字更改为数字

1 个答案: