假设我有一个像这样的数据框df
col1 col2 col3 col4
"C1" "A" "M" somevalue1
"C1" "A" "M" somevalue2
"C2" "B" "N" somevalue3
"C3" "B" "N" somevalue4
"C1" "B" "Y" somevalue5
我必须得到前两列因子的数据帧的所有子集。
至于现在,我得到了所有因素的组合
lapply(lapply(subset(df, select = c("col1", "col2")), factor), levels)
然后我尝试用其中一个因子
对数据帧进行子集化subset(df, c("col1", "col2") == c("C1", "A"))
但这不起作用,也没有其他任何我能想到的组合。
最终输出应该是包含以下数据帧的列表
$1
col1 col2 col3 col4
"C1" "A" "M" somevalue1
"C1" "A" "M" somevalue2
$2
col1 col2 col3 col4
"C2" "B" "M" somevalue3
$3
col1 col2 col3 col4
"C1" "B" "Y" somevalue5
$4
col1 col2 col3 col4
"C3" "B" "N" somevalue4
[编辑] subset(df, all(c("col1","col2") == c("C1", "A")))
也不起作用(返回0行)
答案 0 :(得分:0)
您可以merge
和split
,即
dd <- merge(expand.grid(unique(df$col1), unique(df$col2)), df, by.x = c('Var1', 'Var2'),
by.y = c('col1', 'col2'))
#which gives
# Var1 Var2 col3 col4
#1 C1 A M somevalue1
#2 C1 A M somevalue2
#3 C1 B Y somevalue5
#4 C2 B N somevalue3
#5 C3 B N somevalue4
split(dd, cumsum(!duplicated(dd[c(1:2)])))
#$`1`
# Var1 Var2 col3 col4
#1 C1 A M somevalue1
#2 C1 A M somevalue2
#$`2`
# Var1 Var2 col3 col4
#3 C1 B Y somevalue5
#$`3`
# Var1 Var2 col3 col4
#4 C2 B N somevalue3
#$`4`
# Var1 Var2 col3 col4
#5 C3 B N somevalue4