在R中整理此数据框

时间:2019-02-28 12:22:38

标签: r dataframe

我正在尝试使用单独和收集来组织此数据集 enter image description here

看起来像这样 enter image description here

我无所适从,我觉得分开和聚集应该足以完成此任务,但我可能会遗漏某些东西...我已经尝试过

done <- gather(diseases, Patientdays, Seperations, c(1, 3))

done <- separate(fixdiseases, "Separations_Y2016-17", into = c("Y2016-17", "Separations"), sep = "_")

只是为了让我了解我一直在尝试的事情...我停在这里,因为如果我对其余各列进行相同的操作,看来最终都无法解决

正确,数据。希望这符合此处的礼节,但我已将csv上传到此链接 http://www.filedropper.com/diseases

1 个答案:

答案 0 :(得分:1)

我相信这可以完成工作:

library(dplyr)
library(reshape2)

# read .csv
diseases <- read.csv('diseases.csv')

# melt the dataframe
diseases_melted <- diseases %>% melt(id.var = "Diseases")

diseases_melted$variable %>%        
  as.character() %>%                       
  strsplit('_') %>%                                 # split the year from the variable name
  do.call(rbind, .) %>%                             # bind them together
  `colnames<-`(c('Variable_name', 'Year')) %>%      # set the names here for easier access
  cbind(diseases_melted) %>%                        # add the new columns to the melted dataframe
  dcast(Diseases + Year ~ Variable_name,            # spread the variables again
        value.var = 'value')

数据

对于感兴趣的人,以下是数据:

diseases <- structure(list(Diseases = c("1 Certain infectious and parasitic diseases (A00-B99)", 
"2 Neoplasms (C00-D48)", "3 Diseases of the blood and blood−forming organs and certain disorders involving the immune mechanism (D50-D89)", 
"4 Endocrine, nutritional and metabolic diseases (E00-E89)", 
"5 Mental and behavioural disorders (F00-F99)", "6 Diseases of the nervous system (G00-G99)", 
"7 Diseases of the eye and adnexa (H00-H59)", "8 Diseases of the ear and mastoid process (H60-H95)", 
"9 Diseases of the circulatory system (I00-I99)", "10 Diseases of the respiratory system (J00-J99)", 
"11 Diseases of the digestive system (K00-K93)", "12 Diseases of the skin and subcutaneous tissue (L00-L99)", 
"13 Diseases of the musculoskeletal system and connective tissue (M00-M99)", 
"14 Diseases of the genitourinary system (N00-N99)", "15 Pregnancy, childbirth and the puerperium (O00-O99)", 
"16 Certain conditions originating in the perinatal period (P00-P96)", 
"17 Congenital malformations, deformations and chromosomal abnormalities (Q00-Q99)", 
"18 Symptoms, signs and abnormal clinical and laboratory findings, not elsewhere classified (R00-R99)", 
"19 Injury, poisoning and certain other consequences of external causes (S00-T98)", 
"21 Factors influencing health status and contact with health services (Z00-Z99)", 
"Not reported"), Patientdays_Y2015.16 = c("694,007", "2,223,563", 
"317,085", "582,936", "3,778,574", "884,703", "423,577", "99,880", 
"2,611,423", "1,700,645", "2,136,743", "597,145", "2,369,828", 
"1,062,051", "1,304,805", "581,789", "125,345", "1,603,775", 
"3,175,895", "3,522,214", "50,407"), Separations_Y2015.16 = c("170,095", 
"666,594", "175,590", "169,247", "429,244", "322,843", "397,342", 
"67,185", "556,638", "467,780", "1,042,625", "173,374", "763,336", 
"490,394", "498,823", "69,601", "39,771", "841,423", "747,792", 
"2,508,250", "1,821"), Patientdays_Y2016.17 = c("771,770", "2,235,045", 
"335,699", "612,602", "4,465,669", "868,598", "437,673", "106,969", 
"2,663,249", "1,788,798", "2,162,150", "618,352", "2,402,038", 
"1,052,440", "1,286,556", "573,388", "126,279", "1,694,416", 
"3,249,710", "3,524,083", "15,540"), Separations_Y2016.17 = c("186,034", 
"684,075", "190,568", "184,092", "456,027", "330,698", "410,184", 
"71,962", "576,516", "498,853", "1,059,981", "182,114", "773,279", 
"498,635", "499,408", "70,254", "40,014", "903,760", "782,964", 
"2,613,993", "404")), class = "data.frame", row.names = c(NA, 
-21L))