我想计算百分比份额并使用mutate创建新列。我有以下数据:
country, metric, segment, value1990, value2000, value2010
canada, abc, rural, 10, 15, 16
canada, abc, urban, 12, 12, 18
canada, abc, total, 22, 27, 34
canada, xyz, rural, 6, 9, 10
canada, xyc, urban, 7, 8, 8
canada, xyc, total, 13, 17, 18
canada, population, rural, 80, 86, 95
canada, population, urban, 102, 110, 121
canada, population, total, 182, 196, 216
数据框包含来自多个国家和多年的数据。我想用以下值创建一个新列
country, metric, segment, value, percent1990, percent2000, percent2010
canada, abc, rural, 10, 15, 16, 12.5%, 17.4%, 16.8%
canada, abc, urban, 12, 12, 18, 11.7%, 10.9%, 14.8%
canada, abc, total, 22, 27, 34, 12.1%, 13.7%, 15.7%
canada, xyz, rural, 6, 9, 10, 7.5%, 10.4%, 10.5%
canada, xyc, urban, 7, 8, 8, 6.8%, 7.2%, 6.6%
canada, xyc, total, 13, 17, 18, 7.22%, 8.6%, 8.3%
canada, population, rural, 80, 86, 95, 100%, 100%, 100%
canada, population, urban, 102, 110, 121, 100%, 100%, 100%
canada, population, total, 182, 196, 216, 100%, 100%, 100%
本质上,我想根据多年中农村还是城市/总人口来计算价值变量在人口中所占的百分比。
例如
(第1行)percent_share = (10/80)*100 = 12.5%
(第2行)percent_share = (10/102)*100 = 11.76%
(第3行)percent_share = (10/182)*100 = 12.09%
我无法超越group_by
链来确定如何输入必要的功能
df = df %>%
group_by (country, metric) %>%
mutate(...)
答案 0 :(得分:1)
编辑:获取包含年份的新问题数据
如果将年份和总人口移动到新列,这会更加容易。这是一种方法。
假设示例数据位于名为df1
的数据框中:前gather
年。
library(dplyr)
library(tidyr)
df1 <- df1 %>% gather(Year, Value, 4:6)
然后过滤metric
== population
并返回原始数据。
df1 %>% filter(metric == "population") %>%
left_join(filter(df1, metric != "population"),
by = c("country", "segment", "Year")) %>%
select(country, segment, Year, population = Value.x, metric = metric.y, value = Value.y)
结果:
country segment Year population metric value
1 canada rural value1990 80 abc 10
2 canada rural value1990 80 xyz 6
3 canada urban value1990 102 abc 12
4 canada urban value1990 102 xyc 7
5 canada total value1990 182 abc 22
6 canada total value1990 182 xyc 13
7 canada rural value2000 86 abc 15
8 canada rural value2000 86 xyz 9
9 canada urban value2000 110 abc 12
10 canada urban value2000 110 xyc 8
11 canada total value2000 196 abc 27
12 canada total value2000 196 xyc 17
13 canada rural value2010 95 abc 16
14 canada rural value2010 95 xyz 10
15 canada urban value2010 121 abc 18
16 canada urban value2010 121 xyc 8
17 canada total value2010 216 abc 34
18 canada total value2010 216 xyc 18
然后添加一个变异:
df1 %>% filter(metric == "population") %>%
left_join(filter(df1, metric != "population"),
by = c("country", "segment", "Year")) %>%
select(country, segment, Year, population = Value.x, metric = metric.y, value = Value.y) %>%
mutate(percent_share = 100 * (value / population))
结果:
country segment Year population metric value percent_share
1 canada rural value1990 80 abc 10 12.500000
2 canada rural value1990 80 xyz 6 7.500000
3 canada urban value1990 102 abc 12 11.764706
4 canada urban value1990 102 xyc 7 6.862745
5 canada total value1990 182 abc 22 12.087912
6 canada total value1990 182 xyc 13 7.142857
7 canada rural value2000 86 abc 15 17.441860
8 canada rural value2000 86 xyz 9 10.465116
9 canada urban value2000 110 abc 12 10.909091
10 canada urban value2000 110 xyc 8 7.272727
11 canada total value2000 196 abc 27 13.775510
12 canada total value2000 196 xyc 17 8.673469
13 canada rural value2010 95 abc 16 16.842105
14 canada rural value2010 95 xyz 10 10.526316
15 canada urban value2010 121 abc 18 14.876033
16 canada urban value2010 121 xyc 8 6.611570
17 canada total value2010 216 abc 34 15.740741
18 canada total value2010 216 xyc 18 8.333333
答案 1 :(得分:1)
您也可以将segment
分组,再除以max(value
,因为人口值应该最大:
df %>%
group_by(country, segment) %>%
mutate(percent_share = value / max(value))
# A tibble: 9 x 5
# Groups: segment [3]
country metric segment value percent_share
<chr> <chr> <chr> <dbl> <dbl>
1 canada abc rural 10 0.125
2 canada abc urban 12 0.118
3 canada abc total 22 0.121
4 canada xyz rural 6 0.075
5 canada xyc urban 7 0.0686
6 canada xyc total 13 0.0714
7 canada population rural 80 1
8 canada population urban 102 1
9 canada population total 182 1