我有一个数据集,其中包含一行,以标识患者每次出现症状的时间。它包括标识符,主要症状类别(疾病),经历的症状以及经历的日期。我想利用此数据来计算每个给定月份中每组独特的疾病/症状发生了多少次。我确定该解决方案存在于StackOverflow上,并且已经查看过,但没有发现任何可以使我到达所需位置的东西。我提供了一个示例数据集以及预期的输出,显然是手动创建的。
id <- c(sprintf("A%03d", 1:3), sprintf("B%03d", 1:5))
c("disease", "symptom", "date")
x <- c(rep("bronchitis", 3), rep("flu", 5))
y <- c(rep("coughing", 2), "congestion", rep("fever", 3), "aches", "fatigue")
z <- as.factor(c("Jan 27, 2019", "Jan 26, 2019", "Dec 27, 2018", "Dec 03,
2018", "Dec 18, 2018", "Nov 14, 2018", "Nov 21, 2018", "Jan 15, 2019"))
df <- data.frame("id" = id, "disease" = x, "symptom" = y, "date" = z)
df
a <- c(rep("bronchitis", 2), rep("flu", 3))
b <- c("cough", "congestion", "fever", "aches", "fatigue")
c <- c(0,0,1,1,0)
d <- c(0,1,2,0,0)
e <- c(2, 0, 0, 0, 1)
df2 <- data.frame("disease" = a, "symptom" = b, "Nov" = c, "Dec" = d, "Jan" = e)
df是原始数据集, df2是预期的输出
答案 0 :(得分:1)
不是df2的确切顺序,但是:
> df %>%
mutate(date = substr(df$date, 1, 3)) %>%
group_by(disease, symptom, date) %>%
count() %>%
spread(date, n, fill = 0)
disease symptom Dec Jan Nov
bronchitis congestion 1 0 0
bronchitis coughing 0 2 0
flu aches 0 0 1
flu fatigue 0 1 0
flu fever 2 0 1
答案 1 :(得分:1)
要获取日期的确切顺序,您可以执行以下操作:
df %>%
count(disease,
symptom,
date = factor(format(as.Date(date, "%b%d,%Y"),"%b-%Y"),
levels = apply(expand.grid(month.abb, 1950:2050), 1, paste, collapse ="-"))) %>%
spread(date, n, fill = 0)
这将与您所需的输出相对应,但是会保留名称中的年份(当您从Nov
到Jan
订购时,建议年份也应在订购中起作用,并且分组):
# A tibble: 5 x 5
disease symptom `Nov-2018` `Dec-2018` `Jan-2019`
<fct> <fct> <dbl> <dbl> <dbl>
1 bronchitis congestion 0 1 0
2 bronchitis coughing 0 0 2
3 flu aches 1 0 0
4 flu fatigue 0 0 1
5 flu fever 1 2 0
如果列名中不需要年份,则只需在末尾setNames
:
df %>%
count(disease,
symptom,
date = factor(format(as.Date(date, "%b%d,%Y"),"%b-%Y"),
levels = apply(expand.grid(month.abb, 1950:2050), 1, paste, collapse ="-"))) %>%
spread(date, n, fill = 0) %>%
setNames(., sub("-.*", "", names(.)))
输出:
# A tibble: 5 x 5
disease symptom Nov Dec Jan
<fct> <fct> <dbl> <dbl> <dbl>
1 bronchitis congestion 0 1 0
2 bronchitis coughing 0 0 2
3 flu aches 1 0 0
4 flu fatigue 0 0 1
5 flu fever 1 2 0