我有以下数据框:
name Chr position quantity
AA chr7 151970856 3
AA chr17 59763465 3
AA chr4 55152040 3
AA chr4 55141055 3
AA chr7 151970856 3
BB chr17 59763465 4
BB chr4 55141055 4
CC chr13 32906729 4.5
DD chr5 170837513 5.5
DD chr5 170837513 5.5
DD chr13 32893197 5.5
DD chr3 10088404 5.5
我想创建一个新的日期框架,计算每位患者在原始数据框中出现的次数并匹配相应的数量,如下所示:
name count name quantity
AA 5 3
BB 2 4
CC 1 4.5
DD 4 5.5
有人知道怎么做吗?
答案 0 :(得分:2)
以下是使用dplyr
的解决方案。
library(magrittr);
library(dplyr);
df %>%
group_by(name) %>%
mutate(count_by_name = 1:n()) %>%
filter(row_number(count_by_name) == n()) %>%
select(-Chr, -position)
## A tibble: 4 x 3
## Groups: name [4]
# name quantity count_by_name
# <fct> <dbl> <int>
#1 AA 3.00 5
#2 BB 4.00 2
#3 CC 4.50 1
#4 DD 5.50 4
说明:按name
分组行,每个组的数字条目,仅保留每个组的最后一个条目,以及select
相关的输出列。
或者更好&amp;一行清洁(感谢@Hugh):
df %>% count(name, quantity)
df <- read.table(text =
"name Chr position quantity
AA chr7 151970856 3
AA chr17 59763465 3
AA chr4 55152040 3
AA chr4 55141055 3
AA chr7 151970856 3
BB chr17 59763465 4
BB chr4 55141055 4
CC chr13 32906729 4.5
DD chr5 170837513 5.5
DD chr5 170837513 5.5
DD chr13 32893197 5.5
DD chr3 10088404 5.5", header = T);
答案 1 :(得分:1)
我相信以下会这样做。
result <- as.data.frame(table(dat$name))
names(result) <- c("name", "count_name")
result <- merge(result, dat[, c("name", "quantity")])
result <- result[!duplicated(result), ]
result
# name count_name quantity
#1 AA 5 3.0
#6 BB 2 4.0
#8 CC 1 4.5
#9 DD 4 5.5
DATA。
dat <-
structure(list(name = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L,
3L, 4L, 4L, 4L, 4L), .Label = c("AA", "BB", "CC", "DD"), class = "factor"),
Chr = structure(c(6L, 2L, 4L, 4L, 6L, 2L, 4L, 1L, 5L, 5L,
1L, 3L), .Label = c("chr13", "chr17", "chr3", "chr4", "chr5",
"chr7"), class = "factor"), position = c(151970856L, 59763465L,
55152040L, 55141055L, 151970856L, 59763465L, 55141055L, 32906729L,
170837513L, 170837513L, 32893197L, 10088404L), quantity = c(3,
3, 3, 3, 3, 4, 4, 4.5, 5.5, 5.5, 5.5, 5.5)), .Names = c("name",
"Chr", "position", "quantity"), class = "data.frame", row.names = c(NA,
-12L))
答案 2 :(得分:1)
使用.N
中的data.table
解决方案。
library(data.table)
setDT(df)
df[, .(`count name` = .N, quantity = quantity[1]), name]
name count name quantity 1: AA 5 3.0 2: BB 2 4.0 3: CC 1 4.5 4: DD 4 5.5
或单行版本:
data.table::setDT(df)[, .(`count name` = .N, quantity = quantity[1]), name]