使用变量

时间:2016-03-28 17:14:50

标签: r dplyr

我有一个数据框,其中一些列名称为数字:

> names(spreadResults)
 [1] "PupilMatchingRefAnonymous" "GENDER"                    "URN"                      
 [4] "KS2Eng"                    "KS2Mat"                    "EVERFSM_6"                
 [7] "0001"                      "0003"                      "0009"                     
[10] "0015"                      

我想对每个数字列名称运行报告:

for(DiscID in colnames(spreadResults[7:length(spreadResults)]))
{ 
  #DiscIDcol <- match(DiscID,names(spreadResults))
  colID <- as.name(DiscID)
  print(colID)
  print(DiscID)

  #get data into format suitable for creating tables
  temp <- spreadResults %>% select(GENDER, EVERFSM_6, colID) %>% 
      filter_(!is.na(colID)) %>%
      group_by_(GENDER, EVERFSM_6, colID) %>%
      summarise(n = n()) %>% 
      ungroup()
}

但我明白了:

`0001`
[1] "0001"
Error: All select() inputs must resolve to integer column positions.
The following do not:
*  colID

但是,如果我使用返回刻度``并明确命名列

temp <- spreadResults %>% select(GENDER, EVERFSM_6, `0001`)

没关系。有没有办法用变量解决列名?我知道你可以在select()中使用匹配(DiscID),但是匹配(...)不能在group_by,spread等中工作

从dput()

处理的数据帧的前五行
structure(list(
PupilMatchingRefAnonymous = c(12345L, 12346L, 12347L, 12348L, 12349L), 
GENDER = structure(c(2L, 2L, 1L, 1L, 1L), .Label = c("F", "M"), class = "factor"), 
URN = c(123456L, 123456L, 123456L, 123456L, 123456L), 
KS2Eng = c(4L, 3L, 4L, 5L, 3L), 
KS2Mat = c(4L, 5L, 4L, 4L, 3L), 
EVERFSM_6 = c(1L, 1L, 0L, 0L, 1L), 
`0001` = c(66, 44, NA_real_, 55, 66),
`0003` = c(22, NA_real_, NA_real_, NA_real_, NA_real_), 
`0009` = c(NA_real_, 66, NA_real_, NA_real_, NA_real_), 
`0015` = c(33, NA_real_, 55, NA_real_, NA_real_)), 
.Names = c("PupilMatchingRefAnonymous", "GENDER", "URN", "KS2Eng", "KS2Mat", "EVERFSM_6", 
"0001", "0003", "0009", "0015"), 
row.names = c(NA, 5L), class = "data.frame")

所需的输出:

  GENDER EVERFSM_6  0001     n
  (fctr)     (int) (dbl) (int)
1      F         0    55     1
2      F         1    66     1
3      M         1    44     1
4      M         1    66     1

2 个答案:

答案 0 :(得分:2)

select的帮助建议使用one_of。它适用于以下示例:

df <- data.frame("a" = 1:3 , "b"  = 3:5)
names(df)[1] <- "243234" # rename, to a numeric string

var <- names(df)[1] 

library(dplyr)

df %>% select( one_of(var) )

您还可以看到问题不在您的数字名称中,而是在您调用select:

的方式中
var <- names(df)[2] # use the column named "b"
df %>% select( one_of(var) )
  b
1 3
2 4
3 5
df %>% select( var)
Error: All select() inputs must resolve to integer column positions.
The following do not:
*  var

答案 1 :(得分:2)

要使用dplyr中的任意列名进行编程,您需要使用以_结尾的函数的标准eval版本,这样您的变量就不会被解释为列NSE版本的名称。 (有关NSE的更多信息,请参阅Hadley's book。)

语法应如下所示:

library(dplyr)

cols <- c('Sepal.Length', 'Sepal.Width')

iris %>% select_(.dots = cols) %>% head()
#   Sepal.Length Sepal.Width
# 1          5.1         3.5
# 2          4.9         3.0
# 3          4.7         3.2
# 4          4.6         3.1
# 5          5.0         3.6
# 6          5.4         3.9

如果您还需要固定列名,请将它们插入到字符向量/列表中,或者使用''""quote~引用它们:

iris %>% select_(~Species, .dots = cols) %>% head()
#   Species Sepal.Length Sepal.Width
# 1  setosa          5.1         3.5
# 2  setosa          4.9         3.0
# 3  setosa          4.7         3.2
# 4  setosa          4.6         3.1
# 5  setosa          5.0         3.6
# 6  setosa          5.4         3.9