提供此数据:
+----+---------+----------+------+------+------+
| id | type | name | var1 | var2 | var3 |
+----+---------+----------+------+------+------+
| 10 | Country | Norway | 169 | 14 | 164 |
| 10 | Sport | Skii | 169 | 14 | 164 |
| 10 | Format | Video | 169 | 14 | 164 |
| 11 | Country | Spain | 150 | 16 | 178 |
| 11 | Format | Photo | 150 | 16 | 178 |
| 11 | Sport | Bike | 150 | 16 | 178 |
| 11 | Sport | Soccer | 150 | 16 | 178 |
| 11 | Sport | Basket | 150 | 16 | 178 |
| 12 | Country | USA | 0 | 0 | 0 |
| 12 | Format | Video | 0 | NA | 0 |
| 12 | Sport | Baseball | 0 | 0 | 0 |
+----+---------+----------+------+------+------+
最简单,最干净的传播方式如下:
+----+------+------+------+---------+--------+----------+---------+---------+
| id | var1 | var2 | var3 | Country | Format | Sport_1 | Sport_2 | Sport_3 |
+----+------+------+------+---------+--------+----------+---------+---------+
| 10 | 169 | 14 | 164 | Norway | Video | Skii | NA | NA |
| 11 | 150 | 16 | 178 | Spain | Photo | Bike | Soccer | Basket |
| 12 | 0 | 0 | 0 | USA | Video | Baseball | NA | NA |
+----+------+------+------+---------+--------+----------+---------+---------+
还要注意ID为12的NA。
我尝试使用:
data2 <- data %>% pivot_wider(names_from = type, values_from = name)
但这给我一个警告,说“名称”中的值没有唯一标识,这对于id 11来说是正确的(Sport类型重复三遍)。
此外,我希望ID 12中的NA也会产生问题,因为该函数不会将这些分组在一起:
| 12 | Country | USA | 0 | 0 | 0 |
| 12 | Sport | Baseball | 0 | 0 | 0 |
这:
| 12 | Format | Video | 0 | NA | 0 |
由于NA,尽管ID相同。
非常感谢您的帮助。提前非常感谢!
答案 0 :(得分:2)
我们可以通过filter
从“类型”中选择“运动”元素,然后在单独的join
数据集上进行spread
来实现这一目标
sportdf <- df1 %>%
filter(type == "Sport") %>%
group_by(id) %>%
mutate(type = str_c(type, row_number())) %>%
spread(type, name)
formatCountrydf <- df1 %>%
filter(type != "Sport") %>%
mutate(var2 = replace_na(var2, 0)) %>%
spread(type, name)
inner_join(sportdf, formatCountrydf)
# A tibble: 3 x 9
# Groups: id [3]
# id var1 var2 var3 Sport1 Sport2 Sport3 Country Format
# <int> <int> <dbl> <int> <chr> <chr> <chr> <chr> <chr>
#1 10 169 14 164 Skii <NA> <NA> Norway Video
#2 11 150 16 178 Bike Soccer Basket Spain Photo
#3 12 0 0 0 Baseball <NA> <NA> USA Video
df1 <- structure(list(id = c(10L, 10L, 10L, 11L, 11L, 11L, 11L, 11L,
12L, 12L, 12L), type = c("Country", "Sport", "Format", "Country",
"Format", "Sport", "Sport", "Sport", "Country", "Format", "Sport"
), name = c("Norway", "Skii", "Video", "Spain", "Photo", "Bike",
"Soccer", "Basket", "USA", "Video", "Baseball"), var1 = c(169L,
169L, 169L, 150L, 150L, 150L, 150L, 150L, 0L, 0L, 0L), var2 = c(14L,
14L, 14L, 16L, 16L, 16L, 16L, 16L, 0L, NA, 0L), var3 = c(164L,
164L, 164L, 178L, 178L, 178L, 178L, 178L, 0L, 0L, 0L)),
class = "data.frame", row.names = c(NA,
-11L))
答案 1 :(得分:0)
这是一种方法,它可以借用@akrun的数据:
library(tidyr)
df1 %>%
replace_na(list(var2=0)) %>%
pivot_wider(names_from = "type", values_from = "name", values_fn = list(name=list)) %>%
mutate_at(vars(Country, Format), unlist) %>%
mutate_at("Sport", unclass) %>%
unnest_wider(Sport, names_sep = "_", names_repair = ~sub("..." , "", ., fixed=TRUE))
# New names:
# * `` -> ...1
# New names:
# * `` -> ...1
# * `` -> ...2
# * `` -> ...3
# New names:
# * `` -> ...1
# # A tibble: 3 x 9
# id var1 var2 var3 Country Sport_1 Sport_2 Sport_3 Format
# <int> <int> <dbl> <int> <chr> <chr> <chr> <chr> <chr>
# 1 10 169 14 164 Norway Skii NA NA Video
# 2 11 150 16 178 Spain Bike Soccer Basket Photo
# 3 12 0 0 0 USA Baseball NA NA Video