所以我有一个日期和住院数据。数据是两年的每天。数据看起来像这样:
Date cardioadmission respiratoryadmission
2001-01-01 12 06
2001-01-02 10 5
2001-01-03 08 4
2001-01-04 04 6
我想制作一个这样的结果表
year cvdadmissions respiratoryadmissions
所以我想按年汇总日期,然后按夏季和冬季除以年份。假设我要查看结果如下:
year cvdadmissions respiratoryadmissions
2001 21 22
所以我想按月而不是每天报告录取情况。某种聚合的东西。有人可以指导我吗
更新:
summary <- data %>%
mutate(month = month(Date), # what should i write in month and also in
date
year = year(Date)) %>% #same here what should i write in year and
year(date)
group_by(month, year) %>% # which month and by year which year.
summarise(cvdadmission = sum(cvdadmission),
respiratoryadmission = sum(respiratoryadmission) # i have understood this part.
能否请您详细解释这些背后的逻辑。
谢谢
答案 0 :(得分:0)
添加年/月或年列并按此进行汇总:
library(zoo)
DFym <- transform(DF0, YearMon = as.yearmon(Date))[-1]
aggregate(. ~ YearMon, DFym, sum)
## YearMon cardioadmission respiratoryadmission
## 1 Jan 2001 34 21
DFy <- transform(DF0, Year = as.integer(as.yearmon(Date)))[-1]
aggregate(. ~ Year, DFy, sum)
## Year cardioadmission respiratoryadmission
## 1 2001 34 21
另一种方法是将DF0表示为动物园时间序列:
library(zoo)
z <- read.zoo(DF0)
aggregate(z, as.yearmon, sum)
## cardioadmission respiratoryadmission
## Jan 2001 34 21
aggregate(z, function(x) as.integer(as.yearmon(x)), sum)
## cardioadmission respiratoryadmission
## 2001 34 21
Lines <- "Date cardioadmission respiratoryadmission
2001-01-01 12 06
2001-01-02 10 5
2001-01-03 08 4
2001-01-04 04 6"
DF0 <- read.table(text = Lines, header = TRUE)
DF0$Date <- as.Date(DF0$Date)
固定。
答案 1 :(得分:0)
您可以使用dplyr
和lubridate
,如下所示:
library(dplyr)
library(lubridate)
df %>%
mutate(year = year(Date)) %>%
summarise(cvdadmissions = sum(cardioadmission),
respiratoryadmissions = sum(respiratoryadmission))
如果您想拆分为冬季和夏季,则可以提取mutate
并在season
中使用它来month
另一个字段group_by(year, season)
答案 2 :(得分:0)
这是一个整洁的解决方案:
library(dplyr)
library(lubridate)
summary <- data %>%
mutate(month = month(Date),
year = year(Date)) %>%
group_by(month, year) %>%
summarise(cvdadmission = sum(cvdadmission),
respiratoryadmission = sum(respiratoryadmission)
答案 3 :(得分:0)
在基数R中,您可以使用format
添加年份列
df$Year <- format(as.Date(df$Date), "%Y")
# Date cardioadmission respiratoryadmission Year
# 1 2001-01-01 12 6 2001
# 2 2001-01-02 10 5 2001
# 3 2001-01-03 8 4 2001
# 4 2001-01-04 4 6 2001
然后您可以继续进行分析。这是使用vapply
t(vapply(unique(df$Year), function(y) {
i <- .subset2(df, ncol(df)) == y
c(cardioadmission = sum(.subset2(df, 2L)), respiratoryadmission = sum(.subset2(df, 3L)))
}, numeric(2)))
# cardioadmission respiratoryadmission
# 2001 34 21
数据
df <- structure(list(Date = structure(1:4, .Label = c("2001-01-01",
"2001-01-02", "2001-01-03", "2001-01-04"), class = "factor"),
cardioadmission = c(12, 10, 8, 4), respiratoryadmission = c(6,
5, 4, 6)), class = "data.frame", row.names = c(NA, -4L))