tidyr :: spread tidyr :: pivot_wider每个键具有多个不同的值

时间:2019-08-27 14:36:25

标签: r dplyr tidyr

提供此数据:

+----+---------+----------+------+------+------+
| id |  type   |   name   | var1 | var2 | var3 |
+----+---------+----------+------+------+------+
| 10 | Country | Norway   |  169 | 14   |  164 |
| 10 | Sport   | Skii     |  169 | 14   |  164 |
| 10 | Format  | Video    |  169 | 14   |  164 |
| 11 | Country | Spain    |  150 | 16   |  178 |
| 11 | Format  | Photo    |  150 | 16   |  178 |
| 11 | Sport   | Bike     |  150 | 16   |  178 |
| 11 | Sport   | Soccer   |  150 | 16   |  178 |
| 11 | Sport   | Basket   |  150 | 16   |  178 |
| 12 | Country | USA      |    0 | 0    |    0 |
| 12 | Format  | Video    |    0 | NA   |    0 |
| 12 | Sport   | Baseball |    0 | 0    |    0 |
+----+---------+----------+------+------+------+

最简单,最干净的传播方式如下:

+----+------+------+------+---------+--------+----------+---------+---------+
| id | var1 | var2 | var3 | Country | Format | Sport_1  | Sport_2 | Sport_3 |
+----+------+------+------+---------+--------+----------+---------+---------+
| 10 |  169 |   14 |  164 | Norway  | Video  | Skii     | NA      | NA      |
| 11 |  150 |   16 |  178 | Spain   | Photo  | Bike     | Soccer  | Basket  |
| 12 |    0 |    0 |    0 | USA     | Video  | Baseball | NA      | NA      |
+----+------+------+------+---------+--------+----------+---------+---------+

还要注意ID为12的NA。

我尝试使用:

data2 <- data %>% pivot_wider(names_from = type, values_from = name)

但这给我一个警告,说“名称”中的值没有唯一标识,这对于id 11来说是正确的(Sport类型重复三遍)。

此外,我希望ID 12中的NA也会产生问题,因为该函数不会将这些分组在一起:

| 12 | Country | USA      |    0 | 0    |    0 |
| 12 | Sport   | Baseball |    0 | 0    |    0 |

这:

| 12 | Format  | Video    |    0 | NA   |    0 |

由于NA,尽管ID相同。

非常感谢您的帮助。提前非常感谢!

2 个答案:

答案 0 :(得分:2)

我们可以通过filter从“类型”中选择“运动”元素,然后在单独的join数据集上进行spread来实现这一目标

sportdf <- df1 %>% 
            filter(type == "Sport") %>%
            group_by(id) %>% 
            mutate(type = str_c(type, row_number())) %>%
            spread(type, name)
formatCountrydf <- df1 %>% 
                    filter(type != "Sport")  %>%
                    mutate(var2 = replace_na(var2, 0)) %>%  
                    spread(type, name)
inner_join(sportdf, formatCountrydf)
# A tibble: 3 x 9
# Groups:   id [3]
#     id  var1  var2  var3 Sport1   Sport2 Sport3 Country Format
#  <int> <int> <dbl> <int> <chr>    <chr>  <chr>  <chr>   <chr> 
#1    10   169    14   164 Skii     <NA>   <NA>   Norway  Video 
#2    11   150    16   178 Bike     Soccer Basket Spain   Photo 
#3    12     0     0     0 Baseball <NA>   <NA>   USA     Video 

数据

df1 <- structure(list(id = c(10L, 10L, 10L, 11L, 11L, 11L, 11L, 11L, 
12L, 12L, 12L), type = c("Country", "Sport", "Format", "Country", 
"Format", "Sport", "Sport", "Sport", "Country", "Format", "Sport"
), name = c("Norway", "Skii", "Video", "Spain", "Photo", "Bike", 
"Soccer", "Basket", "USA", "Video", "Baseball"), var1 = c(169L, 
169L, 169L, 150L, 150L, 150L, 150L, 150L, 0L, 0L, 0L), var2 = c(14L, 
14L, 14L, 16L, 16L, 16L, 16L, 16L, 0L, NA, 0L), var3 = c(164L, 
164L, 164L, 178L, 178L, 178L, 178L, 178L, 0L, 0L, 0L)),
class = "data.frame", row.names = c(NA, 
-11L))

答案 1 :(得分:0)

这是一种方法,它可以借用@akrun的数据:

library(tidyr)
df1 %>%
  replace_na(list(var2=0)) %>%
  pivot_wider(names_from = "type", values_from = "name", values_fn = list(name=list)) %>%
  mutate_at(vars(Country, Format), unlist) %>%
  mutate_at("Sport", unclass) %>%
  unnest_wider(Sport, names_sep = "_", names_repair = ~sub("..." , "", ., fixed=TRUE))

# New names:
# * `` -> ...1
# New names:
# * `` -> ...1
# * `` -> ...2
# * `` -> ...3
# New names:
# * `` -> ...1
# # A tibble: 3 x 9
#     id  var1  var2  var3 Country Sport_1  Sport_2 Sport_3 Format
#   <int> <int> <dbl> <int> <chr>   <chr>    <chr>   <chr>   <chr> 
# 1    10   169    14   164 Norway  Skii     NA      NA      Video 
# 2    11   150    16   178 Spain   Bike     Soccer  Basket  Photo 
# 3    12     0     0     0 USA     Baseball NA      NA      Video