我有一个名为'control.scores'的tbl_df(tibble),它有一个名为“Overall”的列,它在1.00和4.00之间。
# A tibble: 2 x 8
group GOV CORC TMSC AUDIT PPS TRAIN Overall
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Group4 0.82 0.2 0.525 0.2833333 0.2 0.2333333 2.261667
2 Group5 0.82 0.0 0.525 0.2833333 0.2 0.2333333 2.061667
我还有另一个名为'control.rating.tbl'的tbl_df:
# create a reference table of control ratings and numeric ranges
control.ref.tbl <- tribble(
~RATING, ~MIN, ~MAX,
"Ineffective", 3.500, 4.00,
"Marginally Effective",2.500 ,3.499,
"Generally Effective", 1.500 ,2.499,
"Highly Effective", 1.00, 1.499
)
如何在'control.scores'中再添加一列,使用整体中的值并检查它在'control.rating.tbl'的MIN和MAX范围之间的位置并返回对应的字符串?
例如,Group4_Overall =='2.261667,对应于'control.rating.tbl'中的'一般有效'。它看起来像这样:
# A tibble: 2 x 8
group GOV CORC TMSC AUDIT PPS TRAIN Overall Rating
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Group4 0.82 0.2 0.525 0.2833333 0.2 0.2333333 2.261667 Generally Effective
2 Group5 0.82 0.0 0.525 0.2833333 0.2 0.2333333 2.061667 Generally Effective
答案 0 :(得分:2)
我们可以考虑使用case_when
中的dplyr
。请注意,我更改了分类的范围,因为原始分类中存在间隙。例如,3.505将根据您的原始分类没有任何关联的类。 dt2
是最终输出。
library(dplyr)
dt2 <- dt %>%
mutate(Rating = case_when(
Overall > 3.5 & Overall <= 4.00 ~ "Ineffective",
Overall > 3 & Overall <= 3.5 ~ "Marginally Effective",
Overall > 2.5 & Overall <= 3 ~ "Generally Effective",
Overall >= 1 & Overall <= 2.5 ~ "Highly Effective"
))
数据:
dt <- read.table(text = "group GOV CORC TMSC AUDIT PPS TRAIN Overall
1 Group4 0.82 0.2 0.525 0.2833333 0.2 0.2333333 2.261667
2 Group5 0.82 0.0 0.525 0.2833333 0.2 0.2333333 2.061667",
header = TRUE, stringsAsFactors = FALSE)