成对生产R中不同长度的向量

时间:2015-03-30 09:04:11

标签: r

说我正在考虑不同人群的比例

Gender: M = .5, F = .5 

Aged = .2, NotAged = .8

Education = "Above High School" = .4, "Below High School" = .6

现在我有一个长格式数据框

a <- data.frame(Variable = c("aged", "NotAged", "Above HS", "Below HS"),
                Male = c(.2, .8, .4, .6),
                Female = c(.2, .8, .4, .6))

现在我想用%部分填充以下数据框:例如

Gender | Aged | Education | %

Male      |NotAged| Below HS  |  .24

中的所有组合
b <- expand.grid(Gender = c("Male", "Female"), 
                 Aged = c("Aged", "NotAged"), 
                 Education = c("Above HS", "Below HS"))

我希望尽可能不使用循环,因为我可能有超过3个分组标准

由于

2 个答案:

答案 0 :(得分:0)

沿着这些方向的某些东西可能是一个开始......

library(reshape)
a2 <- melt(a)
names(a2)[2] <- "Gender"
a2$Aged <- a2$Variable
a2$Aged[!a2$Aged %in% c("aged", "NotAged")] <- NA
a2$Education <- a2$Variable
a2$Education[!a2$Education %in% c("Above HS", "Below HS")] <- NA
a2$Variable <- NULL
a2 <- a2[,c("Gender", "Aged", "Education", "value")]

结果

> a

  Gender    Aged Education value
1   Male    aged      <NA>   0.2
2   Male NotAged      <NA>   0.8
3   Male    <NA>  Above HS   0.4
4   Male    <NA>  Below HS   0.6
5 Female    aged      <NA>   0.2
6 Female NotAged      <NA>   0.8
7 Female    <NA>  Above HS   0.4
8 Female    <NA>  Below HS   0.6

但其余的我不确定你想走哪条路。

答案 1 :(得分:0)

目前我能得到的最简洁的解决方案是使用dplyr :: left_join(或base :: merge)

library(reshape2)
library(dplyr)

a <- data.frame(Variable = c("Aged", "NotAged", "Above HS", "Below HS"),
            Male = c(.2, .8, .4, .6),
            Female = c(.2, .8, .4, .6))

# Create a full list for all combinations
FullList <- expand.grid(Gender = c("Male", "Female"), 
             Aged = c("Aged", "NotAged"), 
             Education = c("Above HS", "Below HS"))

# reshape a to long-format and divide it into two tables
a_long <- a %>% melt(id = "Variable", variable.name = "Gender")
tbl_Aged <- a_long %>% filter(Variable %in% c("Aged", "NotAged")) %>%      rename(Aged = Variable)
tbl_Education <- a_long %>% filter(Variable %in% c("Above HS", "Below HS")) %>% rename(Education = Variable)

Results <- FullList %>%
        left_join(tbl_Aged, by = c("Aged", "Gender")) %>% rename(Aged_Perc = value) %>% # Mapping Aged
        left_join(tbl_Education, by = c("Education", "Gender")) %>% rename(Educ_Perc = value) %>% # Mapping Edu
        mutate(Perc = Aged_Perc * Educ_Perc)

# Check
Results %>% group_by(Gender) %>% summarise(sum(Perc))