使用条件将数字替换为文本

时间:2018-03-04 04:31:53

标签: r

这是一个包含0到1之间值的向量:

a <- runif(100, 0, 1)

我想进行以下转换

>= 0.975 becomes AA+  
<= 0.025 becomes AA-  
< 0.975 && > 0.025 becomes AA
a[a >= 0.975] = 'AA+'  
sum(a == 'AA+')  
3

a[a <= 0.025] = 'AA-'  
sum(a == 'AA-')   
2

a[a > 0.025 && a < 0.975] = 'AA'  
sum(a == 'AA')  
100

换句话说:

a

[1] "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA"
 [16] "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA"
 [31] "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA"
 [46] "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA"
 [61] "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA"
 [76] "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA"
 [91] "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA"

我为何会发生这种情况感到困惑。为什么AA会覆盖前两次转化?

2 个答案:

答案 0 :(得分:1)

请注意:

a[a >= 0.975] = 'AA+'  

整个向量a被转换为不太理想的字符。这样做会更好:

aa <- character(length(a))  # pre-allocate aa
aa[a >= 0.975] <- "AA+"
aa[a > 0.025 & a < 0.975] <- "AA"  # note &, not &&
aa[a <= 0.025] <- "AA-"

以下是一些替代方案:

1)剪切 cut将有效,但会分配值0.975&#34; AA&#34;:

cut(a, c(0, 0.025, 0.975, 1), lab = c("AA-", "AA", "AA+"))

2)下标

c("AA-", "AA", "AA+")[ 1 + (a > 0.025) + (a >= 0.975) ]

3)ifelse

ifelse(a <= 0.025, "AA-", ifelse(a < 0.975, "AA", "AA+"))

4)case_when

library(dplyr)

case_when( a <= 0.025 ~ "AA-",
           a < 0.975 ~ "AA",
           TRUE ~ "AA+")

答案 1 :(得分:0)

1) 修改原始解决方案我们需要使用单&代替&&

a[a > 0.025 & a < 0.975] = 'AA'   
table(a)
# a
#  AA AA- AA+ 
#  92   5   3 

2) 解释根据?"&"

  

&安培;和&amp;&amp;表示逻辑AND和|和||表示逻辑OR。该   较短的形式执行元素比较与大致相同   算术运算符。较长的表单从左到右进行评估   仅检查每个向量的第一个元素。评估进行   只有在确定结果之前

差异很容易理解,即逻辑条件的输出是单个元素

a > 0.025 && a < 0.975
#[1] TRUE

进行回收,所有元素都替换为'AA'

而如果我们这样做

a > 0.025 & a < 0.975
#  [1]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
# [13]  TRUE  TRUE  TRUE  TRUE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE  TRUE
# [25]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE  TRUE
# [37] FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
# [49]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
# [61]  TRUE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
# [73]  TRUE  TRUE  TRUE  TRUE FALSE  TRUE  TRUE FALSE  TRUE  TRUE  TRUE  TRUE
# [85]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE  TRUE  TRUE  TRUE  TRUE
# [97]  TRUE  TRUE  TRUE  TRUE

3) 替代解决方案如果我们需要使用更好的方法,则有findInterval

c("AA-", "AA", "AA+")[findInterval(a, c(0, 0.025, 0.975))]

4) 替换另一个选项是replace

library(dplyr) #for chaining
replace(a, a >= 0.975, 'AA+') %>%
       replace(., .<= 0.025, 'AA-') %>% 
       replace(., . >0.025 & . < 0.975, 'AA')

数据

set.seed(42)
a <- runif(100, 0, 1)
a[a >= 0.975] = 'AA+'
a[a <= 0.025] = 'AA-'