动态列上的子集数据集

时间:2016-07-22 12:06:45

标签: r

我对基于动态列类的数据子集有疑问。例如:

#Coming from other source. Dont exaclty know about their names and number of classes. 
#But following are two demography, which will help in imagining the problem
gender <- c(1,2)
agegroup <- c(1,2,3,4,5,6,7,8)
#moredemo.................

# reproducible data
set.seed(1)
col1 <- as.data.frame(rep(gender, 100))
col2 <- as.data.frame(rep(agegroup, 25))
col3 <- runif(200)
datafile <- cbind(col1, col2, col3)
names(datafile)[1] = "gender"
names(datafile)[2] = "agegroup"
datafile <- as.data.frame(datafile)

#Subset is only for gender = 1 and agegroup = 3
#Subset is for every combination of classes in each demography
#No hardcoded name is required, because demography name will not be know
dat_gender_1_agegroup_3 <- datafile[datafile$gender == 1 & datafile$agegroup == 3, ]

但是人口统计学可能会更多,而不仅仅是性别和年龄组。可以有收入或教育或种族等。每个人口统计学都有不同的班级。请帮助我在不同数量的列上获取数据集数据文件的子集。提前致谢

2 个答案:

答案 0 :(得分:1)

使用扩展网格进行组合,然后应用于子集:

#dummy data
set.seed(123)
mydata <- data.frame(gender = sample(1:2, 100, replace = TRUE),
                     agegroup = sample(1:10, 100, replace = TRUE))

#groups
gender <- c(1,2)
agegroup <- c(1,2,3,4,5,6,7,8)

#get all combo
myCombo <- expand.grid(gender, agegroup)

#result is a list object
apply(myCombo, 1, function(i){
  mydata[ mydata$gender == i[1] &
            mydata$agegroup == i[2], ]
  })

编辑:根据更新,我认为您只需要拆分命令

split(datafile, datafile[, 1:2])

答案 1 :(得分:0)

怎么样(假设列名是“性别”和“年龄组”):

gender <- c(1,2)
agegroup <- c(1,2,3,4,5,6,7,8)

data_subset <- subset(full_data, gender%in%gender | agegroup%in%agegroup | [AND SO ON]) 

您可以根据需要在%[values]中添加[column_name]%。

HTH有点!

编辑:你可以很好地使用&amp;而不是|,显然。