我有一个名为mydf
的数据框。那里的数据帧被称为myid
的行分隔为每个组。因此,我想获取行CDS
的两列之间的两个值(模)之差,并得到如下所示的结果。
mydf<- structure(list(c("myid:AHY03257.1", "176", "myid:YP_009182164.1",
"308", "myid:YP_717161.1", "9801", "8391", "8060"), c(NA, 2605L,
NA, 2443L, NA, 9659L, 8029L, 8407L), c("", "CDS", "", "CDS",
"", "CDS", "CDS", "CDS")), row.names = c(NA, -8L), class = "data.frame")
结果:
myid:AHY03257.1
176 2605 CDS 2429
myid:YP_009182164.1
308 2443 CDS 2135
myid:YP_717161.1
9801 9659 CDS 142
8391 8029 CDS 362
8060 8407 CDS 347
答案 0 :(得分:2)
我们可以使用tidyverse
来做到这一点。将列名称设置为数据集后,根据第一列中str_detect
的出现,对逻辑向量(:
)的累积总和分组,从“ V1”中删除第一个观察值,将其转换为numeric
,并在“ V2”列中获得绝对差异
library(tidyverse)
mydf %>%
set_names(paste0('V', seq_along(.))) %>%
group_by(grp = cumsum(str_detect(V1, ":"))) %>%
mutate(V4 = abs(V2 - c(NA, as.numeric(V1[-1])))) %>%
ungroup %>%
select(-grp) %>%
set_names(rep("", 4)) # better to have column name, removed to match input data
# A tibble: 8 x 4
# `` `` `` ``
# <chr> <int> <chr> <dbl>
#1 myid:AHY03257.1 NA "" NA
#2 176 2605 CDS 2429
#3 myid:YP_009182164.1 NA "" NA
#4 308 2443 CDS 2135
#5 myid:YP_717161.1 NA "" NA
#6 9801 9659 CDS 142
#7 8391 8029 CDS 362
#8 8060 8407 CDS 347
但是,如果没有警告消息,那么我们可以直接将character
列'V1'转换为numeric
(发出警告,因为存在非数字元素-可以转换为{ {1}}),并使用“ V2”列进行绝对区别
NA
答案 1 :(得分:0)
首先,让我重新排序您创建的那种数据集:
mydf <- data.frame(mydf=
c("AHY03257.1","YP_009182164.1","YP_717161.1","YP_717161.1","YP_717161.1"),value_1=
c(176,308,9801,8391,8060),value_2=c(2605,2443,9659,8029,8407),CDS=rep("CDS",5))
然后,您必须创建一个新列:
mydf$abs_diff <- abs(mydf$value_2 - mydf$value_1)