我有以下数据框:
location asset_status count row
<chr> <chr> <dbl> <int>
1 location1 Owned 1 1
2 location1 Available 1 2
3 location1 Owned 1 3
4 location2 Owned 1 4
5 location2 Owned 1 5
6 location2 Owned 1 6
7 location2 Owned 1 7
8 location2 no status 1 8
9 location3 Owned 1 9
10 location3 Owned 1 10
当我尝试使用此传播时,我收到以下错误:
df <- head(us_can_laptops,10) %>%
select(location,asset_status,count) %>%
#mutate(row = row_number()) %>% #excluded
group_by(location) %>%
spread(asset_status,count)
Error: Duplicate identifiers for rows (4, 5, 6, 7), (1, 3)
因此,根据SO上与此相关的其他问题,我添加了一个带mutate的唯一标识符:
df <- head(us_can_laptops,10) %>%
select(location,asset_status,count) %>%
mutate(row = row_number()) %>%
group_by(location) %>%
spread(asset_status,count)
但是这会返回:
location row Available `no status` Owned
* <chr> <int> <dbl> <dbl> <dbl>
1 location2 4 NA NA 1
2 location2 5 NA NA 1
3 location2 6 NA NA 1
4 location2 7 NA NA 1
5 location2 8 NA 1 NA
6 location3 10 NA NA 1
7 location3 9 NA NA 1
8 location1 1 NA NA 1
9 location1 2 1 NA NA
10 location1 3 NA NA 1
另外,每当我尝试总结电话时,它都会破坏我的传播。
这是期望的结果:
location Available `no status` Owned
* <chr> <dbl> <dbl> <dbl>
1 location1 1 NA 2
2 location2 NA 1 4
3 location3 NA NA 2
任何帮助将不胜感激。我知道这看起来像是重复的,但以下相关问题的答案仍然无法解决我的问题: Spread function Error: Duplicate identifiers for rows [duplicate] Spread with duplicate identifiers for rows 1
我在使用dplyr而非dcast
时真的在寻找解决方案答案 0 :(得分:2)
以下内容应该有效(至少提供所需的输出):
df <- structure(list(location = c("location1", "location1", "location1",
"location2", "location2", "location2", "location2", "location2",
"location3", "location3"), asset_status = c("Owned", "Available",
"Owned", "Owned", "Owned", "Owned", "Owned", "no status", "Owned",
"Owned"), count = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L),
row = 1:10), row.names = c(NA, -10L), .Names = c("location",
"asset_status", "count", "row"), class = "data.frame")
library(dplyr)
library(tidyr)
df %>%
group_by(location, asset_status) %>%
summarise(count = sum(count)) %>%
spread(key = asset_status, value = count)