我有这样的问题。我有一个像:
这样的数据库Province cases year month
Newyork 10 2000 1
Newyork 20 2000 2
Newyork 30 2000 3
Newyork 40 2000 4
Los Angeles 30 2000 1
Los Angeles 40 2000 2
Los Angeles 50 2000 3
Los Angeles 60 2000 4
20年来和许多省份的非常大的数据。如何重新组合我的数据以获得这样的一系列时间:
Province cases.at.1.2000 cases.at.2.2000 cases.at.3.2000 cases.at.4.2000
Newyork 10 20 30 40
Los Angeles 30 40 50 60
答案 0 :(得分:5)
只需使用dcast
包中的reshape2
:
library(reshape2)
dcast(df, Province~month+year, value.var='cases')
# Province 1_2000 2_2000 3_2000 4_2000
#1 LosAngeles 30 40 50 60
#2 Newyork 10 20 30 40
数据:强>
df=structure(list(Province = c("Newyork", "Newyork", "Newyork",
"Newyork", "LosAngeles", "LosAngeles", "LosAngeles", "LosAngeles"
), cases = c(10L, 20L, 30L, 40L, 30L, 40L, 50L, 60L), year = c(2000L,
2000L, 2000L, 2000L, 2000L, 2000L, 2000L, 2000L), month = c(1L,
2L, 3L, 4L, 1L, 2L, 3L, 4L)), .Names = c("Province", "cases",
"year", "month"), class = "data.frame", row.names = c(NA, -8L
))
编辑:如果您错过了月/省,您仍然可以使用dcast
:
# Province cases year month
#1 Newyork 10 2000 1
#2 Newyork 20 2000 2
#3 Newyork 30 2000 3
#4 Newyork 40 2000 4
#5 LosAngeles 30 2000 1
#6 LosAngeles 40 2000 2
#7 LosAngeles 50 2000 3
#8 LosAngeles 60 2000 4
#9 Newyork 99 2000 5
#10 SanDiego 99 2000 5
dcast(df, Province~month+year, value.var='cases')
# Province 1_2000 2_2000 3_2000 4_2000 5_2000
#1 LosAngeles 30 40 50 60 NA
#2 Newyork 10 20 30 40 99
#3 SanDiego NA NA NA NA 99
答案 1 :(得分:2)
加入“#month”后,我们可以reshape
使用base R
。和'年'列(paste(...)
)
reshape(
transform(df1, yearmonth=paste('at', month, year, sep="."))[,-(3:4)],
idvar='Province', timevar='yearmonth', direction='wide')
# Province cases.at.1.2000 cases.at.2.2000 cases.at.3.2000 cases.at.4.2000
# 1 Newyork 10 20 30 40
# 5 Los Angeles 30 40 50 60
df1 <- structure(list(Province = c("Newyork", "Newyork", "Newyork",
"Newyork", "Los Angeles", "Los Angeles", "Los Angeles", "Los Angeles"
), cases = c(10L, 20L, 30L, 40L, 30L, 40L, 50L, 60L), year = c(2000L,
2000L, 2000L, 2000L, 2000L, 2000L, 2000L, 2000L), month = c(1L,
2L, 3L, 4L, 1L, 2L, 3L, 4L)), .Names = c("Province", "cases",
"year", "month"), class = "data.frame", row.names = c(NA, -8L))
答案 2 :(得分:0)
基于@Ananda Mahto的建议:
library(tidyr); library(dplyr)
df %>% mutate(month = paste0("cases.at.", month)) %>%
unite(key, month, year, sep=".") %>% spread(key, cases)
如果某个省缺少月 - 年,请使用展开:
df %>% expand(Province, year, month) %>% left_join(df) %>%
mutate(month = paste0("cases.at.", month)) %>%
unite(key, month, year, sep=".") %>% spread(key, cases)
数据:强>
df=structure(list(Province = c("Newyork", "Newyork", "Newyork",
"Newyork", "LosAngeles", "LosAngeles", "LosAngeles", "LosAngeles", "SanDiego"),
cases = c(10L, 20L, 30L, 40L, 30L, 40L, 50L, 60L, 90L), year = c(2000L,
2000L, 2000L, 2000L, 2000L, 2000L, 2000L, 2000L, 2000L), month = c(1L,
2L, 3L, 4L, 1L, 2L, 3L, 4L, 4L)), .Names = c("Province", "cases",
"year", "month"), class = "data.frame", row.names = c(NA, -9L))