对于以下简单数据集;
row country year
1 NLD 2005
2 NLD 2005
3 BLG 2006
4 BLG 2005
5 GER 2005
6 NLD 2007
7 NLD 2005
8 NLD 2008
以下代码:
df[, .N, by = list(country, year)][,prop := N/sum(N)]
给出观察值与观察值总数的比例。但是,我要衡量的是每个国家的比例。我应该如何修改此代码以给我正确的比例?
所需的输出:
row country year prop
1 NLD 2005 0.6
2 NLD 2005 0.6
3 BLG 2006 0.5
4 BLG 2005 0.5
5 GER 2005 1
6 NLD 2007 0.2
7 NLD 2005 0.6
8 NLD 2008 0.2
答案 0 :(得分:1)
使用data.table
:
df <- read.table(header = T, text = "row country year
1 NLD 2005
2 NLD 2005
3 BLG 2006
4 BLG 2005
5 GER 2005
6 NLD 2007
7 NLD 2005
8 NLD 2008")
setDT(df)[, sum := .N, by = country][, prop := .N, by = c("country", "year")][, prop := prop/sum][, sum := NULL]
row country year prop
1: 1 NLD 2005 0.6
2: 2 NLD 2005 0.6
3: 3 BLG 2006 0.5
4: 4 BLG 2005 0.5
5: 5 GER 2005 1.0
6: 6 NLD 2007 0.2
7: 7 NLD 2005 0.6
8: 8 NLD 2008 0.2