在某些条件下更改现有数据框

时间:2018-03-24 11:32:58

标签: r dataframe

我有以下数据框:

name    Chr position    quantity
AA    chr7   151970856  3
AA    chr17  59763465   3
AA    chr4   55152040   3
AA    chr4   55141055   3
AA    chr7   151970856  3
BB    chr17  59763465   4
BB    chr4   55141055   4
CC    chr13  32906729   4.5
DD    chr5   170837513  5.5
DD    chr5   170837513  5.5
DD    chr13  32893197   5.5
DD    chr3   10088404   5.5

我想创建一个新的日期框架,计算每位患者在原始数据框中出现的次数并匹配相应的数量,如下所示:

name    count name  quantity
 AA          5          3
 BB          2          4
 CC          1        4.5
 DD          4        5.5

有人知道怎么做吗?

3 个答案:

答案 0 :(得分:2)

以下是使用dplyr的解决方案。

library(magrittr);
library(dplyr);
df %>%
    group_by(name) %>%
    mutate(count_by_name = 1:n()) %>%
    filter(row_number(count_by_name) == n()) %>%
    select(-Chr, -position)
## A tibble: 4 x 3
## Groups:   name [4]
#   name  quantity count_by_name
#  <fct>     <dbl>         <int>
#1 AA         3.00             5
#2 BB         4.00             2
#3 CC         4.50             1
#4 DD         5.50             4

说明:按name分组行,每个组的数字条目,仅保留每个组的最后一个条目,以及select相关的输出列。

更新

或者更好&amp;一行清洁(感谢@Hugh):

df %>% count(name, quantity)

样本数据

df <- read.table(text =
    "name    Chr position    quantity
AA    chr7   151970856  3
AA    chr17  59763465   3
AA    chr4   55152040   3
AA    chr4   55141055   3
AA    chr7   151970856  3
BB    chr17  59763465   4
BB    chr4   55141055   4
CC    chr13  32906729   4.5
DD    chr5   170837513  5.5
DD    chr5   170837513  5.5
DD    chr13  32893197   5.5
DD    chr3   10088404   5.5", header = T);

答案 1 :(得分:1)

我相信以下会这样做。

result <- as.data.frame(table(dat$name))
names(result) <- c("name", "count_name")
result <- merge(result, dat[, c("name", "quantity")])
result <- result[!duplicated(result), ]
result 
#  name count_name quantity
#1   AA          5      3.0
#6   BB          2      4.0
#8   CC          1      4.5
#9   DD          4      5.5

DATA。

dat <-
structure(list(name = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 
3L, 4L, 4L, 4L, 4L), .Label = c("AA", "BB", "CC", "DD"), class = "factor"), 
    Chr = structure(c(6L, 2L, 4L, 4L, 6L, 2L, 4L, 1L, 5L, 5L, 
    1L, 3L), .Label = c("chr13", "chr17", "chr3", "chr4", "chr5", 
    "chr7"), class = "factor"), position = c(151970856L, 59763465L, 
    55152040L, 55141055L, 151970856L, 59763465L, 55141055L, 32906729L, 
    170837513L, 170837513L, 32893197L, 10088404L), quantity = c(3, 
    3, 3, 3, 3, 4, 4, 4.5, 5.5, 5.5, 5.5, 5.5)), .Names = c("name", 
"Chr", "position", "quantity"), class = "data.frame", row.names = c(NA, 
-12L))

答案 2 :(得分:1)

使用.N中的data.table解决方案。

library(data.table)
setDT(df)
df[, .(`count name` = .N, quantity = quantity[1]), name]
   name count name quantity
1:   AA          5      3.0
2:   BB          2      4.0
3:   CC          1      4.5
4:   DD          4      5.5

或单行版本:

data.table::setDT(df)[, .(`count name` = .N, quantity = quantity[1]), name]