在Tidyr中使用“聚集”功能时遇到问题

时间:2019-11-02 15:30:53

标签: r dplyr plyr tidyr

我在使用R中的collect函数时遇到麻烦。 这是示例数据帧-

library(dplyr)
library(tidyr)
DF = data.frame(Region = c("Asia", "Asia", "Asia", "Europe", "Europe"),
                `Indicator Name` = c("Population", "GDP", "GNI", "Population", "GDP"),
                `2004` = c(22, 33,44,55,56),
                `2005` =c(223, 44,555,66,64))

Region Indicator.Name X2004 X2005
1   Asia     Population    22   223
2   Asia            GDP    33    44
3   Asia            GNI    44   555
4 Europe     Population    55    66
5 Europe            GDP    56    64

这是我想要的数据框


DF2 = data.frame(Region = c("Asia", "Asia", "Europe", "Europe"),
                 Year =  c("X2004", "X2005"),
                 population = c(22, 224, 55, 66),
                 GDP = c(33, 44, 56,64))

Region  Year population GDP
1   Asia X2004         22  33
2   Asia X2005        224  44
3 Europe X2004         55  56
4 Europe X2005         66  64

我想通过gather中的tidyr函数来执行此操作。 我不确定该怎么做。这就是我尝试过的-

gather(DF, key= DF$Indicator.Name, values = "values")

2 个答案:

答案 0 :(得分:4)

这不是简单的gather函数。首先,您需要使数据框变长,然后使其宽以切换所需的列。
这是使用新的pivot_longerpivot_wider函数的解决方案。

library(dplyr)
library(tidyr)

DF = data.frame(Region = c("Asia", "Asia", "Asia", "Europe", "Europe"),
                `Indicator Name` = c("Population", "GDP", "GNI", "Population", "GDP"),
                `2004` = c(22, 33,44,55,56),
                `2005` =c(223, 44,555,66,64))



DF %>% pivot_longer(cols = starts_with("x")) %>% 
       pivot_wider(names_from = Indicator.Name, values_from = value) 

# A tibble: 4 x 5
Region name  Population   GDP   GNI
<fct>  <chr>      <dbl> <dbl> <dbl>
1 Asia   X2004         22    33    44
2 Asia   X2005        223    44   555
3 Europe X2004         55    56    NA
4 Europe X2005         66    64    NA

答案 1 :(得分:4)

使用gatherspread,您将拥有:

DF %>% 
  gather(-Indicator.Name, -Region, key= "Year", value = "value") %>%
  spread(Indicator.Name, value)