我想计算第一次乳腺癌,第一次乳腺癌,第二次乳腺癌等之后的死亡人数......
我的数据看起来像这样,当你看到EVENT专栏时,你可以看到有些人在BC1之前(第一次乳腺癌),BC1之后或BC2之后有死亡事件。我想知道如何计算每个序列中的人数
Death before BC1: #
Death after BC1: #
Death after BC2: #
我想要一些像这样的表,但我现在不担心制作表格。 我只想要相应的计数
I need to declare all the error message like (401,200 etc....) in a property file, need to access them later where ever its required
in the below format mostly
key=messsage
404 = This request caon't be processed
200 = your request is successfull
抱歉在帖子中格式不正确,任何帮助都将不胜感激!
答案 0 :(得分:1)
我认为我们可以假设没有人死后会患上乳腺癌,所以你可以检查一个人是否将BC2作为一个事件,如果他们这样做,他们会在第二次死于癌症后死亡。
library("dplyr")
df <- data.frame(PERSON_ID = c(10000000002, 10000000002, 10000000002,
10000000002,
10000000002, 10000000007, 10000000007, 10000000007,
10000000010, 10000000827, 10000000830, 10000000830),
EVENT = c("BC1", "R_B", "BC2", "DEATH",
"EPI", "BC1", "BC2", "DEATH",
"DEATH", "DEATH", "BC1", "DEATH" ))
group_by(df, PERSON_ID) %>%
summarise(Type = ifelse("BC2" %in% EVENT, "BC2",
ifelse("BC1" %in% EVENT, "BC1",
"BC0"))) %>%
ungroup() %>%
group_by(Type) %>%
summarise(Count = n())
干杯
答案 1 :(得分:0)
您可以转换data.frame以帮助您。在使用dplyr
进行重塑后,使用tidyr
的一种方法就在下面。
肯定有更多的解决方案
library(dplyr)
df <- readr::read_delim("PERSON_ID EVENT
10000000002 BC1
10000000002 R_B
10000000002 BC2
10000000002 DEATH
10000000002 EPI
10000000007 BC1
10000000007 BC2
10000000007 DEATH
10000000010 DEATH
10000000827 DEATH
10000000830 BC1
10000000830 DEATH", delim = " ")
# transform your data to create a new categorical column with what you want
new_df <- df %>%
mutate(value = T) %>%
tidyr::spread(EVENT, value, fill = F) %>%
group_by(PERSON_ID) %>%
mutate(cat = if_else(BC1 && BC2, "after BC2", if_else(BC1, "after BC1", "before BC1"))) %>%
ungroup()
new_df
#> # A tibble: 5 × 7
#> PERSON_ID BC1 BC2 DEATH EPI R_B cat
#> <dbl> <lgl> <lgl> <lgl> <lgl> <lgl> <chr>
#> 1 1e+10 TRUE TRUE TRUE TRUE TRUE after BC2
#> 2 1e+10 TRUE TRUE TRUE FALSE FALSE after BC2
#> 3 1e+10 FALSE FALSE TRUE FALSE FALSE before BC1
#> 4 1e+10 FALSE FALSE TRUE FALSE FALSE before BC1
#> 5 1e+10 TRUE FALSE TRUE FALSE FALSE after BC1
# count the variable
new_df %>% count(cat)
#> # A tibble: 3 × 2
#> cat n
#> <chr> <int>
#> 1 after BC1 1
#> 2 after BC2 2
#> 3 before BC1 2
答案 2 :(得分:0)
这是一个非常简单的解决方案。结果存储在results
变量中。
my_data<- data.frame(PERSON_ID = as.character(c(10000000002,10000000002,10000000002,10000000002,10000000002,10000000007,10000000007,10000000007,10000000010,10000000827,10000000830,10000000830)),
EVENT = c("BC1","R_B","BC2","DEATH","EPI","BC1","BC2","DEATH","DEATH","DEATH","BC1","DEATH"))
my_function <- function(ID){
person <- subset(my_data, PERSON_ID == ID)
a <- which(person $EVENT == "DEATH")
b <- which(person $EVENT == "BC1")
c <- which(person $EVENT == "BC2")
if(length(b) == 0){return("Death_before_BC1")}
else if(length(c) == 0){return("Death_after_BC1")}
else{return("Death_after_BC2")}
}
results_tmp <- sapply(as.character(unique(my_data$PERSON_ID)), my_function)
results <- data.frame(Death_before_BC1 = sum(results_tmp == "Death_before_BC1"),
Death_after_BC1 = sum(results_tmp == "Death_after_BC1"),
Death_after_BC2 = sum(results_tmp == "Death_after_BC2"))