dplyr:用正则表达式重命名列的倍数

时间:2017-02-02 00:21:02

标签: r dplyr

我的数据集包含以下变量:

> colnames(sample)
 [1] "gender"                  "age"                     "partyID"                
 [4] "treatment_rand"          "treatment_bias"          "y_randT"                
 [7] "y_biasT"                 "y_randConti"             "y_biasConti"            
[10] "factor.sample.partyID.1" "factor.sample.partyID.2" "factor.sample.partyID.3"
[13] "factor.sample.partyID.4" "factor.sample.partyID.5" "factor.sample.partyID.6"
[16] "factor.sample.partyID.7" "factor.sample.partyID.8"

我想从所有列中删除factor.sample.。我尝试了这段代码,但收到了错误。

> sample %>%
+   rename_(.dots=setNames(names(.), gsub("factor\\.sample\\.", "", names(.))))
Error in select_impl(.data, vars) : 
  found duplicated column name: factor.sample.partyID.1, factor.sample.partyID.2, factor.sample.partyID.3, factor.sample.partyID.4, factor.sample.partyID.5, factor.sample.partyID.6, factor.sample.partyID.7, factor.sample.partyID.8

如何使用dplyr

进行操作

2 个答案:

答案 0 :(得分:2)

您可以使用dplyr::rename_at()

library(stringr)
sample %>%
    rename_at(
          # select all variables with "factor.sample" in the name
          vars(contains("factor.sample"))
          # use stringr::str_replace to remove factor.sample.
          #   you could do the same with base::gsub()
        , funs(str_replace(., "factor.sample.", ""))
    )

答案 1 :(得分:1)

与其他人一样,当我尝试使用您提供的代码时,我没有收到错误。

但是,我认为你可能会使事情变得比他们需要的更复杂。您应该可以跳过rename的来电,而只需使用setNames。以下是内置iris数据的示例:

iris %>%
  setNames(gsub("Sepal", "Changed", names(.))) %>%
  head(3)

给出

  Changed.Length Changed.Width Petal.Length Petal.Width Species
1            5.1           3.5          1.4         0.2  setosa
2            4.9           3.0          1.4         0.2  setosa
3            4.7           3.2          1.3         0.2  setosa

同样适用于你的,并且可能会回避导致奇怪错误的任何问题。