Question

在我看来，我想tidyr::gather()不仅要收集列名，还要收集第1行和第2行。我想要实现的是拥有一个包含5列和4行的数据框。

这是我正在使用的数据集中的一小部分：

library(tidyverse)

# A tibble: 4 x 3
  Aanduiding                      `Coolsingel 40 links` `Goudseweg 15 links`
  <chr>                           <chr>                 <chr>               
1 Gebiedsnummer                   1                     2                   
2 Postcode                        3011 AD               3031 XH             
3 Leefbaar Rotterdam              124                   110                 
4 Partij van de Arbeid (P.v.d.A.) 58                    65

及其可重复使用的dput(df)：

df <- structure(list(Aanduiding = c("Gebiedsnummer", "Postcode", "Leefbaar Rotterdam", 
"Partij van de Arbeid (P.v.d.A.)"), `Coolsingel 40 links` = c("1", 
"3011 AD", "124", "58"), `Goudseweg 15 links` = c("2", "3031 XH", 
"110", "65")), row.names = c(NA, -4L), class = c("tbl_df", "tbl", 
"data.frame"), .Names = c("Aanduiding", "Coolsingel 40 links", 
"Goudseweg 15 links"))

所以想要看起来像这样：

  Aanduiding                      Gebiedsnummer Postcode adres               value
  <chr>                                   <dbl> <chr>    <chr>               <dbl>
1 Leefbaar Rotterdam                       1.00 3011 AD  Coolsingel 40 links 124  
2 Leefbaar Rotterdam                       1.00 3031 XH  Goudseweg 15 links  120  
3 Partij van de Arbeid (P.v.d.A.)          2.00 3011 AD  Coolsingel 40 links  58.0
4 Partij van de Arbeid (P.v.d.A.)          2.00 3031 XH  Goudseweg 15 links   65.0

我经常使用gather()包中的tidyr函数，但是当我只想收集具有特定值的列名时，这总是如此。现在我实际上想要收集列名，但也要在第1行和第2行进行观察。

我可以在多个密钥上gather吗？或者将观察1和2中的值粘贴到列中，然后gather()然后separate()？

这里最好的策略是什么，如果可能的话tidyr。

非常感谢。

Answer 1

这里有两件事需要做，你必须弄清楚如何相应地细分你的数据集。

data.frame(t(df[1:2,]))

会给你：

                               X1       X2
Aanduiding          Gebiedsnummer Postcode
Coolsingel 40 links             1  3011 AD
Goudseweg 15 links              2  3031 XH

和

tidyr::gather(df[3:4,],key="adres",value="value",  `Coolsingel 40 links`, `Goudseweg 15 links`)

会给你：

  Aanduiding                      adres               value
  <chr>                           <chr>               <chr>
1 Leefbaar Rotterdam              Coolsingel 40 links 124  
2 Partij van de Arbeid (P.v.d.A.) Coolsingel 40 links 58   
3 Leefbaar Rotterdam              Goudseweg 15 links  110  
4 Partij van de Arbeid (P.v.d.A.) Goudseweg 15 links  65

你是如何从那里开始的另一个问题，可能是基于adres的left_join，但这实际上取决于其余数据的结构。

Answer 2

您可以结合使用gather和spread几次。当我需要移出一个值作为计算的分母时，我经常这样做。

library(tidyverse)
...

目标是将Gebiedsnummer和Postcode从Aanduiding中移出，并将另外两列gather移到一列值中。第一个gather为您提供了这一点：

df %>%
  gather(key = address, value = value, -Aanduiding)
#> # A tibble: 8 x 3
#>   Aanduiding                      address             value  
#>   <chr>                           <chr>               <chr>  
#> 1 Gebiedsnummer                   Coolsingel 40 links 1      
#> 2 Postcode                        Coolsingel 40 links 3011 AD
#> 3 Leefbaar Rotterdam              Coolsingel 40 links 124    
#> 4 Partij van de Arbeid (P.v.d.A.) Coolsingel 40 links 58     
#> 5 Gebiedsnummer                   Goudseweg 15 links  2      
#> 6 Postcode                        Goudseweg 15 links  3031 XH
#> 7 Leefbaar Rotterdam              Goudseweg 15 links  110    
#> 8 Partij van de Arbeid (P.v.d.A.) Goudseweg 15 links  65

在此之后使用spread：

df %>%
  gather(key = address, value = value, -Aanduiding) %>%
  spread(key = Aanduiding, value = value)
#> # A tibble: 2 x 5
#>   address    Gebiedsnummer `Leefbaar Rotter… `Partij van de Arbe… Postcode
#>   <chr>      <chr>         <chr>             <chr>                <chr>   
#> 1 Coolsinge… 1             124               58                   3011 AD 
#> 2 Goudseweg… 2             110               65                   3031 XH

然后，您要再次gather，但要保留address，Gebiedsnummer和Postcode作为自己的列。 select仅在此处按顺序排列。所以在一起：

df %>%
  gather(key = address, value = value, -Aanduiding) %>%
  spread(key = Aanduiding, value = value) %>%
  gather(key = Aanduiding, value = value, -Gebiedsnummer, -address, -Postcode) %>%
  select(Aanduiding, Gebiedsnummer, Postcode, address, value) %>%
  mutate_at(vars(Gebiedsnummer, value), as.numeric)
#> # A tibble: 4 x 5
#>   Aanduiding                 Gebiedsnummer Postcode address          value
#>   <chr>                              <dbl> <chr>    <chr>            <dbl>
#> 1 Leefbaar Rotterdam                     1 3011 AD  Coolsingel 40 l…   124
#> 2 Leefbaar Rotterdam                     2 3031 XH  Goudseweg 15 li…   110
#> 3 Partij van de Arbeid (P.v…             1 3011 AD  Coolsingel 40 l…    58
#> 4 Partij van de Arbeid (P.v…             2 3031 XH  Goudseweg 15 li…    65

由reprex package（v0.2.0）于2018-08-24创建。

将具有多行的数据收集或转置为“关键”参数

2 个答案: