这是一个包含0到1之间值的向量:
a <- runif(100, 0, 1)
我想进行以下转换
>= 0.975 becomes AA+
<= 0.025 becomes AA-
< 0.975 && > 0.025 becomes AA
a[a >= 0.975] = 'AA+'
sum(a == 'AA+')
3
a[a <= 0.025] = 'AA-'
sum(a == 'AA-')
2
a[a > 0.025 && a < 0.975] = 'AA'
sum(a == 'AA')
100
换句话说:
a
[1] "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA"
[16] "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA"
[31] "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA"
[46] "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA"
[61] "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA"
[76] "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA"
[91] "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA"
我为何会发生这种情况感到困惑。为什么AA
会覆盖前两次转化?
答案 0 :(得分:1)
请注意:
a[a >= 0.975] = 'AA+'
整个向量a
被转换为不太理想的字符。这样做会更好:
aa <- character(length(a)) # pre-allocate aa
aa[a >= 0.975] <- "AA+"
aa[a > 0.025 & a < 0.975] <- "AA" # note &, not &&
aa[a <= 0.025] <- "AA-"
以下是一些替代方案:
1)剪切 cut
将有效,但会分配值0.975&#34; AA&#34;:
cut(a, c(0, 0.025, 0.975, 1), lab = c("AA-", "AA", "AA+"))
2)下标
c("AA-", "AA", "AA+")[ 1 + (a > 0.025) + (a >= 0.975) ]
3)ifelse
ifelse(a <= 0.025, "AA-", ifelse(a < 0.975, "AA", "AA+"))
4)case_when
library(dplyr)
case_when( a <= 0.025 ~ "AA-",
a < 0.975 ~ "AA",
TRUE ~ "AA+")
答案 1 :(得分:0)
1) 修改原始解决方案我们需要使用单&
代替&&
a[a > 0.025 & a < 0.975] = 'AA'
table(a)
# a
# AA AA- AA+
# 92 5 3
2) 解释根据?"&"
&安培;和&amp;&amp;表示逻辑AND和|和||表示逻辑OR。该 较短的形式执行元素比较与大致相同 算术运算符。较长的表单从左到右进行评估 仅检查每个向量的第一个元素。评估进行 只有在确定结果之前
差异很容易理解,即逻辑条件的输出是单个元素
a > 0.025 && a < 0.975
#[1] TRUE
进行回收,所有元素都替换为'AA'
而如果我们这样做
a > 0.025 & a < 0.975
# [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
# [13] TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE FALSE TRUE
# [25] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE
# [37] FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
# [49] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
# [61] TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
# [73] TRUE TRUE TRUE TRUE FALSE TRUE TRUE FALSE TRUE TRUE TRUE TRUE
# [85] TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE
# [97] TRUE TRUE TRUE TRUE
3) 替代解决方案如果我们需要使用更好的方法,则有findInterval
c("AA-", "AA", "AA+")[findInterval(a, c(0, 0.025, 0.975))]
4) 替换另一个选项是replace
library(dplyr) #for chaining
replace(a, a >= 0.975, 'AA+') %>%
replace(., .<= 0.025, 'AA-') %>%
replace(., . >0.025 & . < 0.975, 'AA')
set.seed(42)
a <- runif(100, 0, 1)
a[a >= 0.975] = 'AA+'
a[a <= 0.025] = 'AA-'