我有一个简单的问题。如何将数据框转换为费雪精确测试的列联表?
我data
有大约19000行:
head(data)
R_T1 R_T2 NR_T1 NR_T2
GMNN 14 60 70 157
GORASP2 7 67 39 188
TTC34 5 69 41 186
ZXDC 8 66 37 190
ASAH2 9 65 46 181
我想将每一行转换为列联表,以执行费雪的精确测试。例如,对于GMNN
:
R NR
T1 14 70
T2 60 157
fisher.test(GMNN, alternative="two.sided")
Fisher's Exact Test for Count Data
data: GMNN
p-value = 0.05273
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
0.2531445 1.0280271
sample estimates:
odds ratio
0.5243787
由于我有19000行数据,我宁愿输出为:
R_T1 R_T2 NR_T1 NR_T2 p-value odds_ratio
GMNN 14 60 70 157 0.05273 0.5243787
GORASP2 7 67 39 188 0.1367 0.504643
TTC34 5 69 41 186 0.02422 0.3297116
ZXDC 8 66 37 190 0.3474 0.6233377
ASAH2 9 65 46 181 0.1648 0.5458072
我迷失了如何做到这一点。有人可以帮忙吗?谢谢!
答案 0 :(得分:5)
您可以使用matrix
将每行转换为列联表:
ft.res <- apply(data, 1, function(x){
t1 <- fisher.test(matrix(x, nrow = 2))
data.frame(p_value = t1$p.value, odds_ratio = t1$estimate)
})
cbind(data, do.call(rbind, ft.res))
# R_T1 R_T2 NR_T1 NR_T2 p_value odds_ratio
# GMNN 14 60 70 157 0.05273179 0.5243787
# GORASP2 7 67 39 188 0.13671487 0.5046430
# TTC34 5 69 41 186 0.02421765 0.3297116
# ZXDC 8 66 37 190 0.34744964 0.6233377
# ASAH2 9 65 46 181 0.16478480 0.5458072
答案 1 :(得分:3)
您可以使用apply
循环遍历数据帧的行:
## Replicating the data
d = data.frame(R_T1=c(14,7,5,8,9),R_T2=c(60,67,69,66,65),NR_T1=c(70,39,41,37,46),NR_T2=c(157,188,186,190,181))
row.names(d) = c("GMNN","GORASP2","TTC34","ZXDC","ASAH2")
## Computing the fisher test and getting the values for each row
d[,c("p_value","odds_ratio")] = t(apply(d,1,function(x) {f=fisher.test(matrix(x,2,2));c(f$p.value,f$estimate)}
R_T1 R_T2 NR_T1 NR_T2 p_value odds_ratio
GMNN 14 60 70 157 0.05273179 0.5243787
GORASP2 7 67 39 188 0.13671487 0.5046430
TTC34 5 69 41 186 0.02421765 0.3297116
ZXDC 8 66 37 190 0.34744964 0.6233377
ASAH2 9 65 46 181 0.16478480 0.5458072
答案 2 :(得分:2)
以下是使用dplyr
mutate
和rowwise
执行该操作的方法:
df <- read.table(text="rowname R_T1 R_T2 NR_T1 NR_T2
GMNN 14 60 70 157
GORASP2 7 67 39 188
TTC34 5 69 41 186
ZXDC 8 66 37 190
ASAH2 9 65 46 181",
header=TRUE,stringsAsFactors = FALSE)
df%>%
rowwise%>%
mutate(p.value=fisher.test(matrix(c(R_T1,R_T2,NR_T1,NR_T2),nrow=2))$p.value,
odds_ratio=fisher.test(matrix(c(R_T1,R_T2,NR_T1,NR_T2),nrow=2))$estimate)
rowname R_T1 R_T2 NR_T1 NR_T2 p.value odds_ratio
<chr> <int> <int> <int> <int> <dbl> <dbl>
1 GMNN 14 60 70 157 0.05273179 0.5243787
2 GORASP2 7 67 39 188 0.13671487 0.5046430
3 TTC34 5 69 41 186 0.02421765 0.3297116
4 ZXDC 8 66 37 190 0.34744964 0.6233377
5 ASAH2 9 65 46 181 0.16478480 0.5458072