通过(给定)因子的所有组合子集数据帧

时间:2017-03-22 12:58:15

标签: r dataframe subset

假设我有一个像这样的数据框df

col1  col2   col3     col4
"C1"   "A"    "M"    somevalue1
"C1"   "A"    "M"    somevalue2
"C2"   "B"    "N"    somevalue3
"C3"   "B"    "N"    somevalue4
"C1"   "B"    "Y"    somevalue5

我必须得到前两列因子的数据帧的所有子集。

至于现在,我得到了所有因素的组合

lapply(lapply(subset(df, select = c("col1", "col2")), factor), levels)

然后我尝试用其中一个因子

对数据帧进行子集化
subset(df, c("col1", "col2") == c("C1", "A"))

但这不起作用,也没有其他任何我能想到的组合。

最终输出应该是包含以下数据帧的列表

$1
col1  col2  col3  col4
"C1"   "A"   "M"  somevalue1
"C1"   "A"   "M"  somevalue2

$2
col1  col2  col3  col4
"C2"   "B"   "M"  somevalue3

$3
col1  col2  col3  col4
"C1"   "B"  "Y"   somevalue5

$4
col1  col2  col3  col4
"C3"   "B"  "N"   somevalue4

[编辑] subset(df, all(c("col1","col2") == c("C1", "A"))) 也不起作用(返回0行)

1 个答案:

答案 0 :(得分:0)

您可以mergesplit,即

dd <- merge(expand.grid(unique(df$col1), unique(df$col2)), df, by.x = c('Var1', 'Var2'), 
                                                               by.y = c('col1', 'col2'))

#which gives
#  Var1 Var2 col3       col4
#1   C1    A    M somevalue1
#2   C1    A    M somevalue2
#3   C1    B    Y somevalue5
#4   C2    B    N somevalue3
#5   C3    B    N somevalue4


split(dd, cumsum(!duplicated(dd[c(1:2)])))
#$`1`
#  Var1 Var2 col3       col4
#1   C1    A    M somevalue1
#2   C1    A    M somevalue2

#$`2`
#  Var1 Var2 col3       col4
#3   C1    B    Y somevalue5

#$`3`
#  Var1 Var2 col3       col4
#4   C2    B    N somevalue3

#$`4`
#  Var1 Var2 col3       col4
#5   C3    B    N somevalue4