如何获得R中组中各列之间的差异?

时间:2018-08-19 15:09:43

标签: r

我有一个名为mydf的数据框。那里的数据帧被称为myid的行分隔为每个组。因此,我想获取行CDS的两列之间的两个值(模)之差,并得到如下所示的结果。

mydf<- structure(list(c("myid:AHY03257.1", "176", "myid:YP_009182164.1", 
"308", "myid:YP_717161.1", "9801", "8391", "8060"), c(NA, 2605L, 
NA, 2443L, NA, 9659L, 8029L, 8407L), c("", "CDS", "", "CDS", 
"", "CDS", "CDS", "CDS")), row.names = c(NA, -8L), class = "data.frame")

结果:

myid:AHY03257.1               
                 176 2605 CDS   2429
myid:YP_009182164.1      
                 308 2443 CDS   2135
myid:YP_717161.1       
                9801 9659 CDS   142
                8391 8029 CDS   362
                8060 8407 CDS   347

2 个答案:

答案 0 :(得分:2)

我们可以使用tidyverse来做到这一点。将列名称设置为数据集后,根据第一列中str_detect的出现,对逻辑向量(:)的累积总和分组,从“ V1”中删除第一个观察值,将其转换为numeric,并在“ V2”列中获得绝对差异

library(tidyverse)
mydf %>% 
   set_names(paste0('V', seq_along(.))) %>%
   group_by(grp = cumsum(str_detect(V1, ":"))) %>%
   mutate(V4 = abs(V2 - c(NA, as.numeric(V1[-1])))) %>%
   ungroup %>%
   select(-grp) %>%
   set_names(rep("", 4)) # better to have column name, removed to match input data
# A tibble: 8 x 4
#  ``                     `` ``       ``
#  <chr>               <int> <chr> <dbl>
#1 myid:AHY03257.1        NA ""       NA
#2 176                  2605 CDS    2429
#3 myid:YP_009182164.1    NA ""       NA
#4 308                  2443 CDS    2135
#5 myid:YP_717161.1       NA ""       NA
#6 9801                 9659 CDS     142
#7 8391                 8029 CDS     362
#8 8060                 8407 CDS     347

但是,如果没有警告消息,那么我们可以直接将character列'V1'转换为numeric(发出警告,因为存在非数字元素-可以转换为{ {1}}),并使用“ V2”列进行绝对区别

NA

答案 1 :(得分:0)

首先,让我重新排序您创建的那种数据集:

mydf <- data.frame(mydf= 
c("AHY03257.1","YP_009182164.1","YP_717161.1","YP_717161.1","YP_717161.1"),value_1= 
c(176,308,9801,8391,8060),value_2=c(2605,2443,9659,8029,8407),CDS=rep("CDS",5))

然后,您必须创建一个新列:

mydf$abs_diff <- abs(mydf$value_2 - mydf$value_1)