如何根据同一列中另一个值的外观来计算变量?

时间:2016-10-28 08:58:38

标签: r

我想计算第一次乳腺癌,第一次乳腺癌,第二次乳腺癌等之后的死亡人数......

我的数据看起来像这样,当你看到EVENT专栏时,你可以看到有些人在BC1之前(第一次乳腺癌),BC1之后或BC2之后有死亡事件。我想知道如何计算每个序列中的人数

Death before BC1: #
Death after  BC1: #
Death after  BC2: #

我想要一些像这样的表,但我现在不担心制作表格。 我只想要相应的计数

I need to declare all the error message like (401,200 etc....) in a property file, need to access them later where ever its required

in the below format mostly 

key=messsage
404 = This request caon't be processed
200 = your request is successfull

抱歉在帖子中格式不正确,任何帮助都将不胜感激!

3 个答案:

答案 0 :(得分:1)

我认为我们可以假设没有人死后会患上乳腺癌,所以你可以检查一个人是否将BC2作为一个事件,如果他们这样做,他们会在第二次死于癌症后死亡。

    library("dplyr")
    df <- data.frame(PERSON_ID = c(10000000002, 10000000002, 10000000002,
                                   10000000002,
                     10000000002, 10000000007, 10000000007, 10000000007, 
                     10000000010, 10000000827, 10000000830, 10000000830),
                     EVENT = c("BC1", "R_B", "BC2", "DEATH",
                               "EPI", "BC1", "BC2", "DEATH",
                               "DEATH", "DEATH", "BC1", "DEATH" ))


    group_by(df, PERSON_ID) %>%
              summarise(Type = ifelse("BC2" %in% EVENT, "BC2",
                   ifelse("BC1" %in% EVENT, "BC1",
                          "BC0"))) %>%
      ungroup() %>%
      group_by(Type) %>%
      summarise(Count = n())

干杯

答案 1 :(得分:0)

您可以转换data.frame以帮助您。在使用dplyr进行重塑后,使用tidyr的一种方法就在下面。 肯定有更多的解决方案

library(dplyr)
df <- readr::read_delim("PERSON_ID EVENT
10000000002 BC1
10000000002 R_B
10000000002 BC2
10000000002 DEATH
10000000002 EPI
10000000007 BC1
10000000007 BC2
10000000007 DEATH
10000000010 DEATH
10000000827 DEATH
10000000830 BC1
10000000830 DEATH", delim = " ")

# transform your data to create a new categorical column with what you want

new_df <- df %>% 
  mutate(value = T) %>% 
  tidyr::spread(EVENT, value, fill = F) %>%
  group_by(PERSON_ID) %>%
  mutate(cat = if_else(BC1 && BC2, "after BC2", if_else(BC1, "after BC1", "before BC1"))) %>%
  ungroup() 

new_df
#> # A tibble: 5 × 7
#>   PERSON_ID   BC1   BC2 DEATH   EPI   R_B        cat
#>       <dbl> <lgl> <lgl> <lgl> <lgl> <lgl>      <chr>
#> 1     1e+10  TRUE  TRUE  TRUE  TRUE  TRUE  after BC2
#> 2     1e+10  TRUE  TRUE  TRUE FALSE FALSE  after BC2
#> 3     1e+10 FALSE FALSE  TRUE FALSE FALSE before BC1
#> 4     1e+10 FALSE FALSE  TRUE FALSE FALSE before BC1
#> 5     1e+10  TRUE FALSE  TRUE FALSE FALSE  after BC1

# count the variable

new_df %>% count(cat)
#> # A tibble: 3 × 2
#>          cat     n
#>        <chr> <int>
#> 1  after BC1     1
#> 2  after BC2     2
#> 3 before BC1     2

答案 2 :(得分:0)

这是一个非常简单的解决方案。结果存储在results变量中。

my_data<- data.frame(PERSON_ID = as.character(c(10000000002,10000000002,10000000002,10000000002,10000000002,10000000007,10000000007,10000000007,10000000010,10000000827,10000000830,10000000830)),
    EVENT  = c("BC1","R_B","BC2","DEATH","EPI","BC1","BC2","DEATH","DEATH","DEATH","BC1","DEATH"))

my_function <- function(ID){
    person <- subset(my_data, PERSON_ID == ID)
    a <- which(person $EVENT == "DEATH")
    b <- which(person $EVENT == "BC1")
    c <- which(person $EVENT == "BC2")
    if(length(b) == 0){return("Death_before_BC1")}
    else if(length(c) == 0){return("Death_after_BC1")}
    else{return("Death_after_BC2")}
    }

results_tmp <- sapply(as.character(unique(my_data$PERSON_ID)), my_function)

results <- data.frame(Death_before_BC1 = sum(results_tmp == "Death_before_BC1"), 
    Death_after_BC1 = sum(results_tmp == "Death_after_BC1"),
    Death_after_BC2 = sum(results_tmp == "Death_after_BC2"))