重新排序并重新格式化R中的表

时间:2019-11-16 22:44:35

标签: r formatting

我有一个像这样的大表(这只是原始表的摘录,它具有数千个函数(行)和许多示例(列,第一个除外)):

function    M123Q   OO987   LKJY11
phi            9       2     0
3R            74      71    65
GlcNAc         1       0     1

我需要像这样对它进行重新排序,添加两个额外的列(“ total_hits”列是“ hits”列中所有具有相同“ ID”的数字的总和,“ Percentage”是“ hits”的乘积/“ total_hits”):

ID    function  hits    total_hits  percentage
M123Q   phi      9         84       0.107142857
M123Q   3R       74        84       0.880952381
M123Q   GlcNAc   1         84       0.011904762
OO987   phi      2         73       0.02739726
OO987   3R       71        73       0.97260274
OO987   GlcNAc    0        73       0
LKJY11  phi       0        66       0
LKJY11  3R       65        66       0.984848485
LKJY11  GlcNAc    1        66       0.015151515 

我目前正在使用R,因此,如果可能的话,我非常感谢R解决方案。

非常感谢。

2 个答案:

答案 0 :(得分:3)

这是一种方法,其中我们将“宽”改成“长”(pivot_longer),并按“ ID”分组,获得“ {hits”和“百分比”的sum '

library(dplyr)
library(tidyr)
df1 %>% 
  pivot_longer(cols = -function., names_to = "ID", values_to = "hits") %>%
  arrange(ID) %>%
  group_by(ID) %>%
  mutate(total_hits = sum(hits), percentage = hits/total_hits)
# A tibble: 9 x 5
# Groups:   ID [3]
#  function. ID      hits total_hits percentage
#  <chr>     <chr>  <int>      <int>      <dbl>
#1 phi       LKJY11     0         66     0     
#2 3R        LKJY11    65         66     0.985 
#3 GlcNAc    LKJY11     1         66     0.0152
#4 phi       M123Q      9         84     0.107 
#5 3R        M123Q     74         84     0.881 
#6 GlcNAc    M123Q      1         84     0.0119
#7 phi       OO987      2         73     0.0274
#8 3R        OO987     71         73     0.973 
#9 GlcNAc    OO987      0         73     0     

数据

df1 <- structure(list(`function.` = c("phi", "3R", "GlcNAc"), M123Q = c(9L, 
74L, 1L), OO987 = c(2L, 71L, 0L), LKJY11 = c(0L, 65L, 1L)),
 class = "data.frame", row.names = c(NA, 
-3L))

答案 1 :(得分:0)

Base R解决方案:

# Reshape the dataframe long-ways:  

df1 <- data.frame(reshape(df1, 

        idvar = "function.",

        ids = unique(df1$function.),

        direction = "long",

        varying = names(df1)[names(df1) != "function."],

        v.names = "hits",

        times = names(df1)[names(df1) != "function."],

        timevar = "ID"), row.names = NULL)

# Groupwise summation of hits (by ID): 

df1$total_hits <- with(df1, ave(hits, ID, FUN = sum))

# Calculation of percentage: 

df1$percentage <- df1$hits/df1$total_hits