我有一个如下所示的数据集:
.t0
和.t1
)this
和that
)1
,22
,22a
)v2
,v3
,ignore.t0
,ignore.t1
,this.t0
,this.t1
,that.t0
, that.t1
)
dat <- data.frame(id = seq(from=1, to=10, by=1),
v2 = rnorm(10),
v3 = rnorm(10),
ignore.t0 = rnorm(10),
this.t0 = rnorm(10),
this1.t0 = rnorm(10),
this22.t0 = rnorm(10),
this22a.t0 = rnorm(10),
that.t0 = rnorm(10),
that1.t0 = rnorm(10),
that22.t0 = rnorm(10),
that22a.t0 = rnorm(10),
ignore.t1 = rnorm(10),
this.t1 = rnorm(10),
this1.t1 = rnorm(10),
this22.t1 = rnorm(10),
this22a.t1 = rnorm(10),
that.t1 = rnorm(10),
that1.t1 = rnorm(10),
that22.t1 = rnorm(10),
that22a.t1 = rnorm(10))
我希望将数据框的子集包含在id
中,并且只包含以下列:
this
或that
)AND 1.
)或数字和字母(22a.
)所以最后,数据框应如下所示:
dat2 <- data.frame(
id = seq(from=1, to=10, by=1),
#v2 = rnorm(10),
#v3 = rnorm(10),
#ignore.t0 = rnorm(10),
#this.t0 = rnorm(10),
this1.t0 = rnorm(10),
this22.t0 = rnorm(10),
this22a.t0 = rnorm(10),
#that.t0 = rnorm(10),
that1.t0 = rnorm(10),
that22.t0 = rnorm(10),
that22a.t0 = rnorm(10),
#ignore.t1 = rnorm(10),
#this.t1 = rnorm(10),
this1.t1 = rnorm(10),
this22.t1 = rnorm(10),
this22a.t1 = rnorm(10),
#that.t1 = rnorm(10),
that1.t1 = rnorm(10),
that22.t1 = rnorm(10),
that22a.t1 = rnorm(10))
数据框比这里表示的要大得多,因此无法输入列索引。也不可能只查找比例名称,因为this.t0
,this.t1
,that.t0
和that.t1
会被捕获。
# not quite right
dat2$id <- dat$id
scales <- c("this", "that")
keep.index <- grep(paste(scales,collapse="|"), names(dat))
temp <- dat[keep.index]
dat2 <- cbind(dat2, temp)
如何修改grep模式以在句点之前查找数字OR(数字和字符)?或者是否有更好的方法?
答案 0 :(得分:6)
这适用于您的示例:
dat[c("id", grep("(this|that)\\d+[a-z]?\\.", names(dat), value = TRUE))]
其中:
\\d+
代表一个或多个数字[a-z]?
适用于零个或一个小写字母\\.
用于点如果要为各种scales
动态构建模式,可以执行以下操作:
scales <- c("this", "that")
pattern <- sprintf("(%s)\\d+[a-z]?\\.", paste(scales, collapse = "|"))
dat[c("id", grep(pattern, names(dat), value = TRUE))]