我有一个包含国家,性别,2013、2014、2014、2015列名称的数据框。
City Gender 2013 2014 2015
Aberdeen Female 30 40 50
Aberdeen Male 20 15 16
Aberdeenshire Female 60 80 70
Aberdeenshire Male 50 40 15
.....Includes 425 records.
我想为每个城市执行男女比例(每个城市按男女比例划分),所以这就是我尝试获得的方式,
City 2013_ratio 2014_ratio 2015_ration
Aberdeen 1.5 2.66 2.5
Aberdeenshire 1.2 2 4.66
谁能帮助我解决这个问题。我曾尝试过按城市分组,但我不知道如何按性别按行获取价值。
答案 0 :(得分:2)
如果“男性”和“女性”位于不同的列中,则可以更轻松地计算比率,您可以使用tidyr
library(dplyr)
library(tidyr)
df %>%
gather(Year, Value, -City, - Gender) %>%
spread(Gender, Value) %>%
mutate(Ratio = Female/Male, Year = paste0(Year, "_Ratio")) %>%
select(-Female, -Male) %>%
spread(Year, Ratio)
答案 1 :(得分:1)
使用tidyverse
:
df = read.table(text="City Gender 2013 2014 2015
Aberdeen Female 30 40 50
Aberdeen Male 20 15 16
Aberdeenshire Female 60 80 70
Aberdeenshire Male 50 40 15", header = T)
> library(tidyverse)
>
> df %>%
group_by(City) %>%
arrange(City, Gender) %>%
summarise_at(vars(X2013:X2015), .funs = funs(ratio = first(.)/last(.)))
# A tibble: 2 x 4
City X2013_ratio X2014_ratio X2015_ratio
<fct> <dbl> <dbl> <dbl>
1 Aberdeen 1.5 2.67 3.12
2 Aberdeenshire 1.2 2 4.67
或
df %>%
group_by(City) %>%
arrange(City,Gender) %>%
summarise_at(vars(X2013:X2015), .funs = funs(ratio = .[Gender == "Female"]/.[Gender != "Female"]))
答案 2 :(得分:1)
罗伯(Rob)建议的解决方案中的代码将是(带有附加的spread()
步骤:
# data
df = data.frame(City = c("a", "a", "b", "b"),
Gender = c("Female", "Male", "Female", "Male"),
`2013` = c(30, 20, 60, 50),
`2014` = c(40, 15, 80, 40),
`2015` = c(50, 16, 70, 15))
# Actual process
library("dplyr")
library("tidyr")
df %>%
# Transform wide table into tidy
gather("Year", "Number", X2013:X2015) %>%
# Reshape gender columns for easier summaries
spread("Gender", "Number") %>%
# Compute ratios
group_by(City, Year) %>%
summarise(ratio = Female/(Male + Female))
#> # A tibble: 6 x 3
#> # Groups: City [?]
#> City Year ratio
#> <fct> <chr> <dbl>
#> 1 a X2013 0.6
#> 2 a X2014 0.727
#> 3 a X2015 0.758
#> 4 b X2013 0.545
#> 5 b X2014 0.667
#> 6 b X2015 0.824
由reprex package(v0.2.1)于2018-10-10创建
要获得准确的结果,您可以应用函数spread()
以将比率分布到多年(spread(Year, ratio)
)