按期间列中的年份子集数据

时间:2020-04-25 09:59:45

标签: r

我有“月和年”列的数据,如何将其作为每年的子集,这是示例数据集

# Libraries
library(ggplot2)
library(reshape2)

# Data
df <- data.frame("Hospital" = c("Buge Hospital", "Buge Hospital", "Greta Hospital", "Greta Hospital",
                                "Makor Hospital", "Makor Hospital"),
                 "Period" = c("Jul-18","Aug-18", "Jul-19","Aug-19", "Jul-20","Aug-20"),
                 "Medical admissions" = c(12,56,0,40,5,56),
                 "Surgical admissions" = c(10,2,0,50,20,56),
                 "Inpatient admissions" = c(9,5,6,0,60,96))

我已经尝试过,但是数据集为空

data_18 <- subset(df, format(as.Date(df$Period, format="%b/%Y"),"%Y")== 2018)

我想提取每年的月度数据,以便可以观察该月度期间的数据趋势 预期结果是对子集进行分类,并且仅获取每年的数据,例如,像拉出2018年的每月数据。

4 个答案:

答案 0 :(得分:2)

我不确定这是否是您要寻找的东西:

data <- Filter(nrow,split(df,list(gsub(".*-","",df$Period),df$Hospital)))
data_18 <- data[grepl("^18",names(data))]

给出

> data
$`18.Buge Hospital`
       Hospital Period Medical.admissions Surgical.admissions Inpatient.admissions
1 Buge Hospital Jul-18                 12                  10                    9
2 Buge Hospital Aug-18                 56                   2                    5

$`19.Greta Hospital`
        Hospital Period Medical.admissions Surgical.admissions Inpatient.admissions
3 Greta Hospital Jul-19                  0                   0                    6
4 Greta Hospital Aug-19                 40                  50                    0

$`20.Makor Hospital`
        Hospital Period Medical.admissions Surgical.admissions Inpatient.admissions
5 Makor Hospital Jul-20                  5                  20                   60
6 Makor Hospital Aug-20                 56                  56                   96

> data_18
$`18.Buge Hospital`
       Hospital Period Medical.admissions Surgical.admissions Inpatient.admissions
1 Buge Hospital Jul-18                 12                  10                    9
2 Buge Hospital Aug-18                 56                   2                    

编辑

如果您只想在2018年对数据进行子集化(感谢@G。Grothendieck)

data_18 <- subset(df, grepl("18", Period))

答案 1 :(得分:2)

我认为您想要的是:

subset(df, format(as.Date(paste('1', Period), '%d %b-%y'), "%Y") == 2018)

#       Hospital Period Medical.admissions Surgical.admissions Inpatient.admissions
#1 Buge Hospital Jul-18                 12                  10                    9
#2 Buge Hospital Aug-18                 56                   2                    5

或使用动物园的yearmon

library(zoo)

subset(df, floor(as.yearmon(Period, "%b-%y")) == 2018)

答案 2 :(得分:1)

存在多种可能性,例如使用strsplit或使用tidyverse如下:

library(tidyr)
library(dplyr)

df %>% separate(Period, into=c("Month", "Year"), "-") %>% filter(Year == 18)

,如果要进行总结,绘图或其他操作,请使用group_by而不是filter,例如:

df %>% 
  separate(Period, into=c("Month", "Year"), "-") %>% 
  group_by(Year) %>% 
  summarize(sum(Medical.admissions))

答案 3 :(得分:1)

并且为了响应您希望在年份和月份中进行子集化并反映出如何在您自己的代码中使用该方法而工作的更加行人的方法:



# Libraries
library(ggplot2)
library(reshape2)
library(lubridate)

# Data
df <- data.frame("Hospital" = c("Buge Hospital", "Buge Hospital", "Greta Hospital", "Greta Hospital",
                                "Makor Hospital", "Makor Hospital"),
                 "Period" = c("Jul-18","Aug-18", "Jul-19","Aug-19", "Jul-20","Aug-20"),
                 "Medical admissions" = c(12,56,0,40,5,56),
                 "Surgical admissions" = c(10,2,0,50,20,56),
                 "Inpatient admissions" = c(9,5,6,0,60,96),
                 stringsAsFactors = FALSE)

# data wrangle to give you a valid date and year varibles, subsetting on year should be straightforward using dplyr::group_by(year, month)
df1 <-  
  df %>%
  mutate(date = as.Date(paste0("01-", Period),format = "%d-%b-%y"),
         year = year(date),
         month = month(date)) 




相关问题