我有一个像这样的data.frame:
data=data.frame(
EyeCol=c("blue","blue","blue","green","green","brown","brown","brown","brown","amber","amber","amber","amber"),
BodyShape=c("skinny","fat","skinny","skinny","skinny","fat","fat","fat","regular","regular","regular","skinny","regular"),
value=rnorm(n = 13)
)
> data
EyeCol BodyShape value
[1,] "blue" "skinny" "-0.151764111069661"
[2,] "blue" "fat" "0.68161499165021"
[3,] "blue" "skinny" "2.45634829248442"
[4,] "green" "skinny" "2.40305139602346"
[5,] "green" "skinny" "-0.490136912577361"
[6,] "brown" "fat" "-0.552026475080878"
[7,] "brown" "fat" "0.466500574627706"
[8,] "brown" "fat" "0.133980090309033"
[9,] "brown" "regular" "-0.801840913223832"
[10,] "amber" "regular" "-0.0484879443196371"
[11,] "amber" "regular" "0.0269352552010763"
[12,] "amber" "skinny" "-1.12761016311858"
[13,] "amber" "regular" "1.857866986502"
我必须比较所有可能的组合,即:G reen-eyes Regular-Bodysite .versus。棕色眼睛Skinny-Bodysite
但我必须避免自我比较是多余的( Green Regular .versus.Brown Skinny 与 Brown Skinny .versus.Green Regular 相同)
目前,我正在构建一个指导选择的数据框
EyeCol=c("blue","green","brown","amber")
BodyShape=c("skinny","regular","fat")
expand.grid(list(ec.x=EyeCol,bs.x=BodyShape,ec.y=EyeCol,bs.y=BodyShape))
然后我删除无关紧要的行:自我比较和互惠行。
最后但并非最不重要,我可能有N个因素(这里有2个bodyshape和eyecolor)。
有没有人有一个优雅的方法来做这件事,而不是耗费时间,因为我的数据有大约5k行,而且因子的数量目前达到7。