选择并比较人口

时间:2017-04-04 09:16:27

标签: r

我有一个像这样的data.frame:

data=data.frame(
  EyeCol=c("blue","blue","blue","green","green","brown","brown","brown","brown","amber","amber","amber","amber"),
  BodyShape=c("skinny","fat","skinny","skinny","skinny","fat","fat","fat","regular","regular","regular","skinny","regular"),
  value=rnorm(n = 13)
)
> data
      EyeCol  BodyShape value                
 [1,] "blue"  "skinny"  "-0.151764111069661" 
 [2,] "blue"  "fat"     "0.68161499165021"   
 [3,] "blue"  "skinny"  "2.45634829248442"   
 [4,] "green" "skinny"  "2.40305139602346"   
 [5,] "green" "skinny"  "-0.490136912577361" 
 [6,] "brown" "fat"     "-0.552026475080878" 
 [7,] "brown" "fat"     "0.466500574627706"  
 [8,] "brown" "fat"     "0.133980090309033"  
 [9,] "brown" "regular" "-0.801840913223832" 
[10,] "amber" "regular" "-0.0484879443196371"
[11,] "amber" "regular" "0.0269352552010763" 
[12,] "amber" "skinny"  "-1.12761016311858"  
[13,] "amber" "regular" "1.857866986502" 

我必须比较所有可能的组合,即:G reen-eyes Regular-Bodysite .versus。棕色眼睛Skinny-Bodysite

但我必须避免自我比较是多余的( Green Regular .versus.Brown Skinny Brown Skinny .versus.Green Regular 相同)

目前,我正在构建一个指导选择的数据框

EyeCol=c("blue","green","brown","amber")
BodyShape=c("skinny","regular","fat")

expand.grid(list(ec.x=EyeCol,bs.x=BodyShape,ec.y=EyeCol,bs.y=BodyShape))

然后我删除无关紧要的行:自我比较和互惠行。

最后但并非最不重要,我可能有N个因素(这里有2个bodyshape和eyecolor)。

有没有人有一个优雅的方法来做这件事,而不是耗费时间,因为我的数据有大约5k行,而且因子的数量目前达到7。

0 个答案:

没有答案