我有一个像这样的大表(这只是原始表的摘录,它具有数千个函数(行)和许多示例(列,第一个除外)):
function M123Q OO987 LKJY11
phi 9 2 0
3R 74 71 65
GlcNAc 1 0 1
我需要像这样对它进行重新排序,添加两个额外的列(“ total_hits”列是“ hits”列中所有具有相同“ ID”的数字的总和,“ Percentage”是“ hits”的乘积/“ total_hits”):
ID function hits total_hits percentage
M123Q phi 9 84 0.107142857
M123Q 3R 74 84 0.880952381
M123Q GlcNAc 1 84 0.011904762
OO987 phi 2 73 0.02739726
OO987 3R 71 73 0.97260274
OO987 GlcNAc 0 73 0
LKJY11 phi 0 66 0
LKJY11 3R 65 66 0.984848485
LKJY11 GlcNAc 1 66 0.015151515
我目前正在使用R,因此,如果可能的话,我非常感谢R解决方案。
非常感谢。
答案 0 :(得分:3)
这是一种方法,其中我们将“宽”改成“长”(pivot_longer
),并按“ ID”分组,获得“ {hits”和“百分比”的sum
'
library(dplyr)
library(tidyr)
df1 %>%
pivot_longer(cols = -function., names_to = "ID", values_to = "hits") %>%
arrange(ID) %>%
group_by(ID) %>%
mutate(total_hits = sum(hits), percentage = hits/total_hits)
# A tibble: 9 x 5
# Groups: ID [3]
# function. ID hits total_hits percentage
# <chr> <chr> <int> <int> <dbl>
#1 phi LKJY11 0 66 0
#2 3R LKJY11 65 66 0.985
#3 GlcNAc LKJY11 1 66 0.0152
#4 phi M123Q 9 84 0.107
#5 3R M123Q 74 84 0.881
#6 GlcNAc M123Q 1 84 0.0119
#7 phi OO987 2 73 0.0274
#8 3R OO987 71 73 0.973
#9 GlcNAc OO987 0 73 0
df1 <- structure(list(`function.` = c("phi", "3R", "GlcNAc"), M123Q = c(9L,
74L, 1L), OO987 = c(2L, 71L, 0L), LKJY11 = c(0L, 65L, 1L)),
class = "data.frame", row.names = c(NA,
-3L))
答案 1 :(得分:0)
Base R解决方案:
# Reshape the dataframe long-ways:
df1 <- data.frame(reshape(df1,
idvar = "function.",
ids = unique(df1$function.),
direction = "long",
varying = names(df1)[names(df1) != "function."],
v.names = "hits",
times = names(df1)[names(df1) != "function."],
timevar = "ID"), row.names = NULL)
# Groupwise summation of hits (by ID):
df1$total_hits <- with(df1, ave(hits, ID, FUN = sum))
# Calculation of percentage:
df1$percentage <- df1$hits/df1$total_hits