我有一个数据框,其中一些列名称为数字:
> names(spreadResults)
[1] "PupilMatchingRefAnonymous" "GENDER" "URN"
[4] "KS2Eng" "KS2Mat" "EVERFSM_6"
[7] "0001" "0003" "0009"
[10] "0015"
我想对每个数字列名称运行报告:
for(DiscID in colnames(spreadResults[7:length(spreadResults)]))
{
#DiscIDcol <- match(DiscID,names(spreadResults))
colID <- as.name(DiscID)
print(colID)
print(DiscID)
#get data into format suitable for creating tables
temp <- spreadResults %>% select(GENDER, EVERFSM_6, colID) %>%
filter_(!is.na(colID)) %>%
group_by_(GENDER, EVERFSM_6, colID) %>%
summarise(n = n()) %>%
ungroup()
}
但我明白了:
`0001`
[1] "0001"
Error: All select() inputs must resolve to integer column positions.
The following do not:
* colID
但是,如果我使用返回刻度``并明确命名列
temp <- spreadResults %>% select(GENDER, EVERFSM_6, `0001`)
没关系。有没有办法用变量解决列名?我知道你可以在select()中使用匹配(DiscID),但是匹配(...)不能在group_by,spread等中工作
从dput()
处理的数据帧的前五行structure(list(
PupilMatchingRefAnonymous = c(12345L, 12346L, 12347L, 12348L, 12349L),
GENDER = structure(c(2L, 2L, 1L, 1L, 1L), .Label = c("F", "M"), class = "factor"),
URN = c(123456L, 123456L, 123456L, 123456L, 123456L),
KS2Eng = c(4L, 3L, 4L, 5L, 3L),
KS2Mat = c(4L, 5L, 4L, 4L, 3L),
EVERFSM_6 = c(1L, 1L, 0L, 0L, 1L),
`0001` = c(66, 44, NA_real_, 55, 66),
`0003` = c(22, NA_real_, NA_real_, NA_real_, NA_real_),
`0009` = c(NA_real_, 66, NA_real_, NA_real_, NA_real_),
`0015` = c(33, NA_real_, 55, NA_real_, NA_real_)),
.Names = c("PupilMatchingRefAnonymous", "GENDER", "URN", "KS2Eng", "KS2Mat", "EVERFSM_6",
"0001", "0003", "0009", "0015"),
row.names = c(NA, 5L), class = "data.frame")
所需的输出:
GENDER EVERFSM_6 0001 n
(fctr) (int) (dbl) (int)
1 F 0 55 1
2 F 1 66 1
3 M 1 44 1
4 M 1 66 1
答案 0 :(得分:2)
select
的帮助建议使用one_of
。它适用于以下示例:
df <- data.frame("a" = 1:3 , "b" = 3:5)
names(df)[1] <- "243234" # rename, to a numeric string
var <- names(df)[1]
library(dplyr)
df %>% select( one_of(var) )
您还可以看到问题不在您的数字名称中,而是在您调用select:
的方式中var <- names(df)[2] # use the column named "b"
df %>% select( one_of(var) )
b
1 3
2 4
3 5
df %>% select( var)
Error: All select() inputs must resolve to integer column positions.
The following do not:
* var
答案 1 :(得分:2)
要使用dplyr
中的任意列名进行编程,您需要使用以_
结尾的函数的标准eval版本,这样您的变量就不会被解释为列NSE版本的名称。 (有关NSE的更多信息,请参阅Hadley's book。)
语法应如下所示:
library(dplyr)
cols <- c('Sepal.Length', 'Sepal.Width')
iris %>% select_(.dots = cols) %>% head()
# Sepal.Length Sepal.Width
# 1 5.1 3.5
# 2 4.9 3.0
# 3 4.7 3.2
# 4 4.6 3.1
# 5 5.0 3.6
# 6 5.4 3.9
如果您还需要固定列名,请将它们插入到字符向量/列表中,或者使用''
,""
,quote
或~
引用它们:
iris %>% select_(~Species, .dots = cols) %>% head()
# Species Sepal.Length Sepal.Width
# 1 setosa 5.1 3.5
# 2 setosa 4.9 3.0
# 3 setosa 4.7 3.2
# 4 setosa 4.6 3.1
# 5 setosa 5.0 3.6
# 6 setosa 5.4 3.9