Question

问题：如何从现有数据集生成新数据集。

我有一些非常重要的数据，我在下面提供简化版本。数据来自个人，我的性别，原籍国以及他们所在的行业和职业。

我想： 1.创建一个列，其中我所有扇区X占用组合存储。 2.对于每个这样的行业X职业，计算有多少女性，有多少男性，以及每个国家有多少。

id      <- c(1,2,3,4,5)
occupation <- c(11,12,11,12,11)
sector <- c("a", "b", "c", "a", "b")
sex     <- c(0,1,0,1,0)
country <- c(1,2,3,2,1)
data    <- data.frame(id, occupation, sector, sex, country)

id  occupation sector sex country 
1   11          a      0    1       
2   12          b      1    2       
3   11          a      0    3        
4   12          a      1    2        
5   11          b      0    1

这是我想要获得的：

  occXsector sex0 sex1 country1 country2 country3
1   11-a     0    2    1        0        1
2   11-b     0    1    1        0        0
3   12-a     1    0    0        1        0
4   12-b     1    0    0        1        0

非常感谢任何帮助！

Answer 1

您需要清理输入/输出，也就是说，您显示的预期输出对您提供的输入没有意义，但请尝试一下

library(dplyr)
library(tidyr)
data %>%
  mutate(occXsector = paste(occupation, sector, sep="-")) %>%
  gather(key, value, sex, country) %>%
  mutate(newvalue = paste(key, value, sep="")) %>%
  group_by(occXsector) %>%
  count(newvalue) %>%
  spread(newvalue, n, fill=0)

# A tibble: 5 x 6
# Groups:   occXsector [5]
  occXsector country1 country2 country3  sex0  sex1
*      <chr>    <dbl>    <dbl>    <dbl> <dbl> <dbl>
1       11-a        1        0        0     1     0
2       11-b        1        0        0     1     0
3       11-c        0        0        1     1     0
4       12-a        0        1        0     0     1
5       12-b        0        1        0     0     1

转换数据结构

1 个答案: