我有一个数据框df。我需要找到各组之间ColE和ColF之间的相关性。
df = structure(list(ColA = c("A", "A", "A", "B", "B"), ColB = c("L",
"L", "L", "L", "K"), ColC = c("Sup1", "Sup1", "Sup2", "Sup1",
"Sup1"), ColD = c("Jan", "Feb", "Mar", "Apr", "May"), ColE = c(56,
59, 68, 45, 45), ColF = c(58, 60, 90, 65, 59)), row.names = c(NA,
-5L), class = c("tbl_df", "tbl", "data.frame"))
ColA ColB ColC ColD ColE ColF
A L Sup1 Jan 56 58
A L Sup1 Feb 59 60
A L Sup2 Mar 68 90
B L Sup1 Apr 45 65
B K Sup1 May 45 59
这里是ColA,ColB之间的组,我需要找到相关性,所以输出应该像
New ColA New ColB Correlation coeff
A L ---
B L ---
B K ---
类似地,如果我需要找到其他群体之间的相关系数
New ColA New ColB New ColC Correlation coeff
A L Sup1 ---
A L Sup2 ---
B L Sup1 ---
B K Sup1 ---
有没有办法解决这个问题?
答案 0 :(得分:1)
使用data.table
包
> data.table(df)[,j=list(kor=cor(ColE,ColF)),by=list(ColA,ColB)]
ColA ColB kor
1: A L 0.982613
2: B L NA
3: B K NA
答案 1 :(得分:0)
使用dplyr
,您可以执行以下操作:
df %>%
group_by(ColA, ColB) %>%
summarise(corr_coeff = cor(ColE, ColF))
ColA ColB corr_coeff
<chr> <chr> <dbl>
1 A L 0.983
2 B K NA
3 B L NA
请注意,对于两个组,因为它们只有一个值,所以不会计算系数。