Question

在下面的数据中，有单位的列名（1-8）。每个单元都有一个分数列和百分比。有没有办法使dplyr::select()与num_range()助手一起选择，比如只有1-3的分数？如果我删除后缀（因为它只是unit_1而不是unit_1_score），我可以得到它，但是否则我的尝试都没有成功。我试过了dplyr::select(d, num_range("unit_", 1:3, "_score"))，但这似乎并不奏效。任何帮助将不胜感激。

d <- readr::read_csv("https://data.jacksonms.gov/api/views/97iy-g8hk/rows.csv")
d <- janitor::clean_names(d)
names(d)

 [1] "test_year"             "test_type"             "test_site"             "student_id"           
 [5] "pre_test_score"        "pre_test_percent"      "post_test_score"       "post_test_percent"    
 [9] "percentage_change"     "unit_1_score"          "unit_1_percent"        "unit_2_score"         
 [13] "unit_2_percent"        "unit_3_score"          "unit_3_percent"        "unit_4_score"         
 [17] "unit_4_percent"        "unit_5_6_score"        "unit_5_6_percent"      "unit_7_score"         
 [21] "unit_7_percent"        "unit_8_score"          "unit_8_percent"        "total_score"          
 [25] "total_percent_correct"

Answer 1

我们可以使用dplyr::matches()选择具有正则表达式范围的列：

select(d, matches("unit_[1-3]_score"))

Answer 2

我希望这个答案不会被视为偏离主题;我假设你会对有效的回复感到满意，即使它没有使用dplyr。

您可以使用正则表达式轻松选择data.frame中的某些列。例如，要选择单位1-3，请尝试：d[, grep(x = colnames(d), pattern = "^unit_[1-3]{1}_.*$)"]这将选择d中列名称以“unit_”开头，后跟1,2或3（仅一次）的列），之后是零或更多的任何事情。

Answer 3

尽管5_6列会很棘手（谁认为这是个好主意！？），您可能会发现新的tidyeval概念对此很有用。 syms包中的rlang函数和新的!!!扩展方法协同工作来解决此类问题：

dplyr::select(d, !!!rlang::syms(paste0("unit_", 1:3, "_score")))
#> # A tibble: 48 x 3
#>    unit_1_score unit_2_score unit_3_score
#>           <int>        <int>        <int>
#>  1            3            4            6
#>  2            5            5            6
#>  3            4            4            6
#>  4            4            4            6
#>  5            2            5            6
#>  6            5            5            7
#>  7            5            5            6
#>  8            4            5            5
#>  9            6            4            5
#> 10            4            5            5
#> # ... with 38 more rows

准确地解释这是有点棘手的（尝试阅读vignette("tidy-evaluation")）但是它有效，所以有：）

虽然实际上，现在只使用字符串，所以也许你不需要打扰？

dplyr::select(d, paste0("unit_", 1:3, "_score"))
#> # A tibble: 48 x 3
#>    unit_1_score unit_2_score unit_3_score
#>           <int>        <int>        <int>
#>  1            3            4            6
#>  2            5            5            6
#>  3            4            4            6
#>  4            4            4            6
#>  5            2            5            6
#>  6            5            5            7
#>  7            5            5            6
#>  8            4            5            5
#>  9            6            4            5
#> 10            4            5            5
#> # ... with 38 more rows

`dplyr :: select（num_range（））`当数字在列名中间时

3 个答案: