我有一个包含63列和50行的数据框。我在下面给出了一个玩具数据集。
>df
rs_1 rs_2 rs_3 rs_4 ... rs_60 A.Ag B.Ag C.Ag
0 0 1 2 ... 1 02:/01 02:/07 03:07/04:01
1 2 1 2 ... 0 02:/01 02:/07 03:07/04:01
2 1 1 2 ... 2 02:/01 02:/07 03:07/04:01
0 0 1 0 ... 2 02:/01 02:/07 03:07/04:01
现在我需要分别为每个rs_ * = 0,1和2找到列的最高频率(A.Ag,B.Ag和C.Ag)。期望的结果将是例如rs _ * = 0
rs_id Code A.Ag Code B.Ag Code C.Ag
rs_1 02:/01 2 02:/07 5 03:07 5
rs_2 02:/01 3 01:/05 2 05:00 4
你能帮我解决这个问题吗?我尝试了以下功能
for (i in 1:60){
if (file[,i]==0)
{
temp1 = data.frame(sort(table(file[,61]), decreasing = TRUE)) #onlr for A.Ag coulmn
temp1$Var1 = names(file)[i]
res_types = rbind(res_types, temp1)
}
}
我得到了频率和rs_id。但无法获得代码。任何人都可以帮我这个吗?
欲望的结果将是
rs_id Code Combination A.A Combination B.Ag Combination C.Ag
rs_1 0 1:01/1:01 7 13:02/13:02 2 03:04/03:04 3
rs_1 0 1:01/11:01 5 13:02/49:01 2 03:04/15:02 3
rs_1 0 1:01/2:01 4 13:02/57:01 2 03:04/7:01 3
rs_1 1 1:01/2:05 3 13:02/8:01 4 06:02/06:02 3
rs_1 1 1:01/24:02 3 14:01/14:02 3 06:02/15:02 3
rs_1 1 1:01/24:02 3 14:01/14:02 2 06:02/15:02 3
rs_2 0 1:01/31:01 3 15:01/15:01 1 06:02/3:03 4
rs_2 0 11:01/2:01 4 15:01/18:01 1 06:02/3:04 1
答案 0 :(得分:0)
使用data.table
包可能更容易。内联说明。
library(data.table)
#convert into a long format
longDat <- melt(dat, measure.vars=patterns("^rs"), variable.name="rs_id",
value.name="val_id")
#for each group of rs_id (rs_1, ..., rs_60) and val_id in (0,1,2),
#count the frequency of each code
longDat[,
unlist(
lapply(c("A.Ag","B.Ag","C.Ag"),
function(x) setNames(aggregate(get(x), list(get(x)), length), c("Code", x))
),
recursive=FALSE),
by=c("rs_id", "val_id")]
这是你在找什么?这有帮助吗?
数据:
library(data.table)
dat <- fread("rs_1,rs_2,rs_3,rs_4,rs_60,A.Ag,B.Ag,C.Ag
0,0,1,2,1,02:/01,02:/07,03:07/04:01
1,2,1,2,0,02:/01,02:/07,03:07/04:01
2,1,1,2,2,02:/01,02:/07,03:07/04:01
0,0,1,0,2,02:/01,02:/07,03:07/04:01")
编辑:OP请求检索每个rs_id,val_id和* .Ag
的前3位这样做的可读性更高* .Ag一次,计数然后取3,最后合并所有结果如下:
library(data.table)
#convert into a long format
longDat <- melt(dat, measure.vars=patterns("^rs"), variable.name="rs_id",
value.name="val_id")
ids <- c("rs_id", "val_id")
Reduce(function(dt1,dt2) merge(dt1,dt2,by=ids,all=TRUE),
lapply(c("A.Ag","B.Ag","C.Ag"), function(x) {
res <- longDat[, list(.N), by=c(ids, x)][order(-N)]
setnames(res[, head(.SD ,3L), by=ids], c(x, "N"), c(paste0(x,"_Code"), x))
}))