R函数获取因子内变量数的百分比

时间:2019-08-28 07:52:04

标签: r bigdata data-analysis

我正在使用以下aprox 40.000.000行的data.frame:

structure(list(group = c(1003, 1003, 1003, 1003, 1003, 1003, 
1003, 1003, 1003, 1003), t_year = c("2014", "2014", "2014", "2014", 
"2014", "2014", "2014", "2014", "2014", "2014"), tmonth = c(3, 
3, 3, 3, 3, 3, 3, 3, 3, 3), tday = c("02", "02", "02", "02", 
"02", "02", "02", "02", "02", "02"), md = c(2507.416244074, 2507.416244074, 
2507.416244074, 2507.416244074, 2507.416244074, 2507.416244074, 
2507.416244074, 2507.416244074, 2507.416244074, 2507.416244074
), st = c(640722481.20599, 640722481.20599, 640722481.20599, 
640722481.20599, 640722481.20599, 640722481.20599, 640722481.20599, 
640722481.20599, 640722481.20599, 640722481.20599), bsc = c(255530.960493802, 
255530.960493802, 255530.960493802, 255530.960493802, 255530.960493802, 
255530.960493802, 255530.960493802, 255530.960493802, 255530.960493802, 
255530.960493802), animal = c("HOUSA000062901617", "HOUSA000006684687", 
"HO982000202967406", "HOUSA000057341913", "HOUSA000139926709", 
"JEUSA000057281350", "HOUSA000056634042", "XXUSA000056639940", 
"HOUSA000064279445", "HOUSA000066846844"), ln = c(6L, 2L, 1L, 
2L, 4L, 2L, 3L, 2L, 5L, 1L), gluc = c(37892.914163, 100000, 606286.6266, 
303143.3133, 303143.3133, 35355.339059, 37892.914163, 37892.914163, 
214354.69251, 37892.914163), gluc_cat = c(1L, 1L, 6L, 5L, 5L, 
1L, 1L, 1L, 4L, 1L), ol = structure(c(1L, 1L, 2L, 1L, 1L, 1L, 
1L, 1L, 1L, 2L), .Label = c("mult", "prim"), class = "factor"), 
    group_size = c("<2000", "<2000", "<2000", "<2000", "<2000", 
    "<2000", "<2000", "<2000", "<2000", "<2000"), date = structure(c(16131, 
    16131, 16131, 16131, 16131, 16131, 16131, 16131, 16131, 16131
    ), class = "Date"), season = c("Spring", "Spring", "Spring", 
    "Spring", "Spring", "Spring", "Spring", "Spring", "Spring", 
    "Spring")), row.names = c(NA, 10L), class = "data.frame")

我想了解一年中每个类别中动物数量的行为。举个例子。在夏季,其中30%的动物的葡萄糖<200.000(Gluc_cat 1),2%的动物介于200.000至400.000(gluc_cat 2),15%的动物在400.000至600.000(gluc_cat 3)等,依此类推

尝试按tyear,ol和季节按频率标签显示每只gluc_cat内的动物数量,如下所示:

year 2018
ol "prim"
                             season
gluc_cat      Fall    Spring    Summer    Winter
1           16.387677 11.653786 11.719490 10.978675
2            8.307579  5.189070  4.725884  3.862277
3            9.730989  6.571146  3.427911  4.223216
4            3.991289  2.919394  2.877867  4.922916
5            9.224311  4.429528  7.717457 10.597084
6            52.358155 69.237076 69.531391 65.415832

year 2018
ol   "mult"
                           season
gluc_cat      Fall    Spring    Summer    Winter
1           16.387677 11.653786 11.719490 10.978675
2            8.307579  5.189070  4.725884  3.862277
3            9.730989  6.571146  3.427911  4.223216
4            3.991289  2.919394  2.877867  4.922916
5            9.224311  4.429528  7.717457 10.597084
6            52.358155 69.237076 69.531391 65.415832


我尝试了以下代码:

freq <- prop.table(xtabs(glucose~gluc_cat+season+ord_lact,cabt2),2)*100
freq

但是我想我正在获取葡萄糖值的频率,对吧?实际上,我想知道每种葡萄糖(gluc_cat)类别中按年,ol和季节变化的动物数量的变化。

0 个答案:

没有答案