在R中,我有一个data.frame,其中包含ID,句点和一些年份给出的变量。
df <- structure(list(id = 10001:10010,
period = structure(c(1L, 2L, 1L, 1L, 2L, 1L, 2L, 1L, 2L, 1L),
.Label = c("2017-01", "2017-02"), class = "factor"),
year1 = c(2013, 2014, 2014, 2017, 1970, 2014, 1970, 2014, 2014, 1970),
year2 = c(2015, NA, 2015, 2015, 1970, 1970, 2014, 2017, 2014, 2017),
year3 = c(2015, 2014, 2015, NA, NA, 2014, 2014, 2015, 1970, 2014),
year4 = c(1970, 2014, 2013, 2014, 1970, 1970, 2014, 2015, 2015, NA)),
.Names = c("id", "period", "year1", "year2", "year3", "year4"),
row.names = c(NA, -10L),
class = "data.frame")
我想要实现的是创建一个data.frame,其中,对于每个时段和每个变量(year1-year4),频率显示特定年份(1970,2013-2017)的显示频率。 期望的结果应该是这样的:
result <- structure(list(period = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L),
.Label = c("2017-01", "2017-02"),
class = "factor"),
category = structure(c(1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L),
.Label = c("year1", "year2", "year3", "year4"),
class = "factor"),
X1970 = c(1L, 1L, 0L, 2L, 2L, 1L, 1L, 1L),
X2013 = c(1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L),
X2014 = c(3L, 0L, 2L, 1L, 2L, 2L, 2L, 2L),
X2015 = c(0L, 3L, 3L, 1L, 0L, 0L, 0L, 1L),
X2017 = c(1L, 2L, 0L, 0L, 0L, 0L, 0L, 0L)),
.Names = c("period", "category", "X1970", "X2013", "X2014", "X2015", "X2017"),
class = "data.frame",
row.names = c(NA, -8L))
我有R的中等经验,我相信这可以实现。但我完全迷失了如何处理这个问题。建议/想法?
答案 0 :(得分:4)
以下是使用tidyverse
,
library(tidyverse)
df %>%
gather(category, val, -c(id, period)) %>%
na.omit() %>%
group_by(period, category, val) %>%
summarise(new = n()) %>%
spread(val, new, fill = 0)
给出,
# A tibble: 8 x 7 # Groups: period, category [8] period category `1970` `2013` `2014` `2015` `2017` * <fctr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> 1 2017-01 year1 1 1 3 0 1 2 2017-01 year2 1 0 0 3 2 3 2017-01 year3 0 0 2 3 0 4 2017-01 year4 2 1 1 1 0 5 2017-02 year1 2 0 2 0 0 6 2017-02 year2 1 0 2 0 0 7 2017-02 year3 1 0 2 0 0 8 2017-02 year4 1 0 2 1 0
这里的主要技巧是转换为长格式,计算频率,然后转换回宽格式。
使用count
函数(如@markdly提及)的更短版本可以是,
df %>%
gather(category, val, -c(id, period)) %>%
na.omit() %>%
count(period, category, val) %>%
spread(val, n, fill = 0)
答案 1 :(得分:3)
如果我正确理解了您的问题,您可以使用包reshape2
实现此目的,首先通过melt
将数据从“宽”格式转换为“长”格式,然后通过{{1}将其转回在选择所需的类别时 - 我希望这就是你所需要的。
cast