我对基于动态列类的数据子集有疑问。例如:
#Coming from other source. Dont exaclty know about their names and number of classes.
#But following are two demography, which will help in imagining the problem
gender <- c(1,2)
agegroup <- c(1,2,3,4,5,6,7,8)
#moredemo.................
# reproducible data
set.seed(1)
col1 <- as.data.frame(rep(gender, 100))
col2 <- as.data.frame(rep(agegroup, 25))
col3 <- runif(200)
datafile <- cbind(col1, col2, col3)
names(datafile)[1] = "gender"
names(datafile)[2] = "agegroup"
datafile <- as.data.frame(datafile)
#Subset is only for gender = 1 and agegroup = 3
#Subset is for every combination of classes in each demography
#No hardcoded name is required, because demography name will not be know
dat_gender_1_agegroup_3 <- datafile[datafile$gender == 1 & datafile$agegroup == 3, ]
但是人口统计学可能会更多,而不仅仅是性别和年龄组。可以有收入或教育或种族等。每个人口统计学都有不同的班级。请帮助我在不同数量的列上获取数据集数据文件的子集。提前致谢
答案 0 :(得分:1)
使用扩展网格进行组合,然后应用于子集:
#dummy data
set.seed(123)
mydata <- data.frame(gender = sample(1:2, 100, replace = TRUE),
agegroup = sample(1:10, 100, replace = TRUE))
#groups
gender <- c(1,2)
agegroup <- c(1,2,3,4,5,6,7,8)
#get all combo
myCombo <- expand.grid(gender, agegroup)
#result is a list object
apply(myCombo, 1, function(i){
mydata[ mydata$gender == i[1] &
mydata$agegroup == i[2], ]
})
编辑:根据更新,我认为您只需要拆分命令
split(datafile, datafile[, 1:2])
答案 1 :(得分:0)
怎么样(假设列名是“性别”和“年龄组”):
gender <- c(1,2)
agegroup <- c(1,2,3,4,5,6,7,8)
data_subset <- subset(full_data, gender%in%gender | agegroup%in%agegroup | [AND SO ON])
您可以根据需要在%[values]中添加[column_name]%。
HTH有点! 编辑:你可以很好地使用&amp;而不是|,显然。