我有一个刑事犯罪历史数据集,按以下方式列出:
ID Charge Chargedate VictimID ...
1 Robbery 2013-04-05 1
1 Theft 2013-04-06 2
1 Theft 2013-04-07 2
2 Homicide2013-04-08 3
2 Theft 2013-04-09 3
2 Burglary2013-04-10 3
...
我想以两种方式重塑数据集。 首先,我想重新整形,以便每行对应一个唯一的ID值,没有victimID。我还想通过统计来总结收费的存在。例如而不是在数据集中有15个盗窃变量,我想只有一个值为15的theftcount变量。
e.g。
ID Robberycount Robberydate1 Theftcount Theftdate1 Theftdate2 ...
1 1 2013-04-05 2 2013-04-06 2013-04-07
2 0 NA 1 2013-04-09 NA
...
我想要创建的其他数据集涉及重塑数据集,但每行都对应于每个唯一ID和victimID对,例如
ID VictimID Robberycount Robberydate1 Theftcount Theftdate1 Theftdate2 ...
1 1 1 2013-04-05 0 NA NA
1 2 0 NA 2 2013-04-06 2013-04-07
2 3 0 NA 1 2013-04-09 NA
...
我尝试使用Melt包来做这件事,但我似乎无法得到我想要的结果。特别是,我不知道如何制作像dcast这样的函数或者熔合聚合攻击数据并根据每次收费制作日期。有没有办法实现我想要的而不需要手动排序数据集?
答案 0 :(得分:2)
您需要分两步完成此操作,因此转换两次到宽。因此,您必须先准备好两个键。那么丑陋的是你最终会有更多的行,可以通过dplyr::summarise
和unique
来修复(na.rm
在这里unique
会是很好的功能;-)) 。试试这个:
df <- read.table(text = "ID Charge Chargedate VictimID
1 Robbery 2013-04-05 1
1 Theft 2013-04-06 2
1 Theft 2013-04-07 2
2 Homicide 2013-04-08 3
2 Theft 2013-04-09 3
2 Burglary 2013-04-10 3
", header = TRUE, stringsAsFactors = FALSE)
library(dplyr)
library(tidyr)
# first data frame:
df %>%
group_by(ID, Charge) %>%
mutate(key_date = paste0(Charge, "date", seq_len(n())),
key_count = paste0(Charge, "count"),
count = n()) %>%
ungroup() %>%
select(-Charge, -VictimID) %>%
spread(key = key_count, value = count, fill = 0) %>%
spread(key = key_date, value = Chargedate) %>%
group_by(ID) %>%
mutate_at(.vars = vars(matches("count$")), sum) %>%
summarise_all(.funs = function(x) {
x <- unique(x[!is.na(x)])
ifelse(length(x) == 0, NA_character_, x)
})
# second data frame you asked for:
df %>%
group_by(ID, Charge, VictimID) %>%
mutate(key_date = paste0(Charge, "date", seq_len(n())),
key_count = paste0(Charge, "count"),
count = n()) %>%
ungroup() %>%
select(-Charge) %>%
spread(key = key_count, value = count, fill = 0) %>%
spread(key = key_date, value = Chargedate) %>%
group_by(ID, VictimID) %>%
mutate_at(.vars = vars(matches("count$")), sum) %>%
summarise_all(.funs = function(x) {
x <- unique(x[!is.na(x)])
ifelse(length(x) == 0, NA_character_, x)
})