子数据:我可以使用apply family或其他函数而不是loop

时间:2016-06-03 23:20:09

标签: r function subset sapply

我需要像这样对数据进行子集化:

a1 <- data[,grep("a_cool_[1-3]*", names(data))]
a2 <- data[,grep("word_an[1-3]*", names(data))]
a3 <- data[,grep("word_ne[1-3]*", names(data))]
a4 <- data[,grep("word_an[1-3]*", names(data))]
a5 <- data[,grep("word_sam[1-3]*", names(data))]
a6 <- data[,grep("word_snap[1-3]*", names(data))]
a7 <- data[,grep("word_app[1-3]*", names(data))]

我认为使用其他功能(例如`* apply()系列函数)可以简化此过程,但我不确定如何。

1 个答案:

答案 0 :(得分:0)

试试这个:

#Create a dataset
thedata <- data.frame(matrix(rnorm(220),nrow = 20,ncol = 11))
varnames <- c('a_cool_3', 'word_an1', 'word_an2', 'word_ne3', 'word_an', 'word_sam3', 'word_snap1', 'word_app3', 'randomcol', 'anotherone','yetanother')
names(thedata) <- varnames

#Create the patterns you wish your column names to have (please note that | means 'OR' in regex)
patterns <- "a_cool_[1-3]*|word_an[1-3]*|word_ne[1-3]*|word_sam[1-3]*|word_snap[1-3]*|word_app[1-3]*"

#Use the grep function to grab the columns with those patterns
output_df <- thedata[,grep(pattern=patterns,names(thedata),perl=T)]

#It only prints out columns with the patterns
head(output_df)

a_cool_3   word_an1   word_an2    word_ne3    word_an  word_sam3    word_snap1  word_app3
1  1.8225436  0.7570277 -0.4114735 -0.87751389  0.2845020  1.2813361  0.5506499685 -1.3622255
2  0.0178158  0.5977225  2.5022158 -0.80579000 -0.2524916  1.0446857 -0.5382501876  0.8778370
3 -0.4222182  0.1785882 -0.9802086  0.71497031  0.2719002 -0.4319695  0.8670455296 -0.8917643
4 -0.1642998  1.7782387  0.6997389  0.06620839 -0.9951579 -0.1363725 -0.5289680333 -0.1564115
5  0.6785524  0.7319884  0.2843869 -2.25325312 -0.4032888  0.3661970  1.4291588013 -0.2203280
6 -0.7548342 -2.1009707 -2.0157028 -0.34596984 -0.6964674  0.2260157 -0.0001932224 -0.2866768

我希望这会有所帮助。