我的数据框包含"name"
美国总统,他们开始和结束的年份("from"
和"to"
列)。这是一个示例:
name from to
Bill Clinton 1993 2001
George W. Bush 2001 2009
Barack Obama 2009 2012
...以及dput
的输出:
dput(tail(presidents, 3))
structure(list(name = c("Bill Clinton", "George W. Bush", "Barack Obama"
), from = c(1993, 2001, 2009), to = c(2001, 2009, 2012)), .Names = c("name",
"from", "to"), row.names = 42:44, class = "data.frame")
我想创建包含两列("name"
和"year"
)的数据框,每年都有一位总统在职的行。因此,我需要每年从" from
"到"to"
创建一个常规序列。这是我预期的结果:
name year
Bill Clinton 1993
Bill Clinton 1994
...
Bill Clinton 2000
Bill Clinton 2001
George W. Bush 2001
George W. Bush 2002
...
George W. Bush 2008
George W. Bush 2009
Barack Obama 2009
Barack Obama 2010
Barack Obama 2011
Barack Obama 2012
我知道我可以使用data.frame(name = "Bill Clinton", year = seq(1993, 2001))
扩展单个总统的事情,但我无法弄清楚如何为每个总统进行迭代。
我该怎么做?我觉得我应该知道这一点,但我要画一个空白。
好的,我已尝试过两种解决方案,但我收到了错误消息:
foo<-structure(list(name = c("Grover Cleveland", "Benjamin Harrison", "Grover Cleveland"), from = c(1885, 1889, 1893), to = c(1889, 1893, 1897)), .Names = c("name", "from", "to"), row.names = 22:24, class = "data.frame")
ddply(foo, "name", summarise, year = seq(from, to))
Error in seq.default(from, to) : 'from' must be of length 1
答案 0 :(得分:13)
这是一个data.table
解决方案。它有很好的(如果是次要的)将总统留在他们提供的订单中的功能:
library(data.table)
dt <- data.table(presidents)
dt[, list(year = seq(from, to)), by = name]
# name year
# 1: Bill Clinton 1993
# 2: Bill Clinton 1994
# ...
# ...
# 21: Barack Obama 2011
# 22: Barack Obama 2012
编辑:要处理非连续字词的总统,请改用:
dt[, list(year = seq(from, to)), by = c("name", "from")]
答案 1 :(得分:12)
您可以使用plyr
包:
library(plyr)
ddply(presidents, "name", summarise, year = seq(from, to))
# name year
# 1 Barack Obama 2009
# 2 Barack Obama 2010
# 3 Barack Obama 2011
# 4 Barack Obama 2012
# 5 Bill Clinton 1993
# 6 Bill Clinton 1994
# [...]
如果数据按年份排序很重要,您可以使用arrange
函数:
df <- ddply(presidents, "name", summarise, year = seq(from, to))
arrange(df, df$year)
# name year
# 1 Bill Clinton 1993
# 2 Bill Clinton 1994
# 3 Bill Clinton 1995
# [...]
# 21 Barack Obama 2011
# 22 Barack Obama 2012
编辑1:关注@ edgester的“更新1”,更合适的方法是使用adply
来计算具有非连续术语的总统:
adply(foo, 1, summarise, year = seq(from, to))[c("name", "year")]
答案 2 :(得分:5)
这是一个dplyr
解决方案:
library(dplyr)
# the data
presidents <-
structure(list(name = c("Bill Clinton", "George W. Bush", "Barack Obama"
), from = c(1993, 2001, 2009), to = c(2001, 2009, 2012)), .Names = c("name",
"from", "to"), row.names = 42:44, class = "data.frame")
# the expansion of the table
presidents %>%
rowwise() %>%
do(data.frame(name = .$name, year = seq(.$from, .$to, by = 1)))
# the output
Source: local data frame [22 x 2]
Groups: <by row>
name year
(chr) (dbl)
1 Bill Clinton 1993
2 Bill Clinton 1994
3 Bill Clinton 1995
4 Bill Clinton 1996
5 Bill Clinton 1997
6 Bill Clinton 1998
7 Bill Clinton 1999
8 Bill Clinton 2000
9 Bill Clinton 2001
10 George W. Bush 2001
.. ... ...
答案 3 :(得分:2)
另一个base
解决方案:
l <- mapply(`:`, d$from, d$to)
data.frame(name = d$name[rep(1:nrow(d), lengths(l))], year = unlist(l))
# name year
# 1 Bill Clinton 1993
# 2 Bill Clinton 1994
# ...snip
# 8 Bill Clinton 2000
# 9 Bill Clinton 2001
# 10 George W. Bush 2001
# 11 George W. Bush 2002
# ...snip
# 17 George W. Bush 2008
# 18 George W. Bush 2009
# 19 Barack Obama 2009
# 20 Barack Obama 2010
# 21 Barack Obama 2011
# 22 Barack Obama 2012
答案 4 :(得分:1)
以下是一个快速基础 - R
解决方案,其中Df
是您的data.frame
:
do.call(rbind, apply(Df, 1, function(x) {
data.frame(name=x[1], year=seq(x[2], x[3]))}))
它提供了有关行名称的一些警告,但似乎返回正确的data.frame
。
答案 5 :(得分:0)
使用tidyverse
的另一种选择是将gather
数据转换成长格式group_by
name
并在from
和to
之间创建一个序列日期。
library(tidyverse)
presidents %>%
gather(key, date, -name) %>%
group_by(name) %>%
complete(date = seq(date[1], date[2]))%>%
select(-key)
# A tibble: 22 x 2
# Groups: name [3]
# name date
# <chr> <dbl>
# 1 Barack Obama 2009
# 2 Barack Obama 2010
# 3 Barack Obama 2011
# 4 Barack Obama 2012
# 5 Bill Clinton 1993
# 6 Bill Clinton 1994
# 7 Bill Clinton 1995
# 8 Bill Clinton 1996
# 9 Bill Clinton 1997
#10 Bill Clinton 1998
# … with 12 more rows
答案 6 :(得分:0)
使用tidyverse
和unnest
的另一种map2
方法。
library(tidyverse)
presidents %>%
unnest(year = map2(from, to, seq)) %>%
select(-from, -to)
# name year
# 1 Bill Clinton 1993
# 2 Bill Clinton 1994
...
# 21 Barack Obama 2011
# 22 Barack Obama 2012
答案 7 :(得分:0)
使用by
创建一个by
数据帧的列表L
,每个总裁一个数据帧,然后rbind
在一起。不使用任何软件包。
L <- by(presidents, presidents$name, with, data.frame(name, year = from:to))
do.call("rbind", setNames(L, NULL))
如果您不介意行名,那么最后一行可以简化为:
do.call("rbind", L)
答案 8 :(得分:0)
使用dplyr
和tidyr
的另一种解决方案:
library(magrittr) # for pipes
df <- data.frame(tata = c('toto1', 'toto2'), from = c(2000, 2004), to = c(2001, 2009))
# tata from to
# 1 toto1 2000 2001
# 2 toto2 2004 2009
df %>%
dplyr::as.tbl() %>%
dplyr::rowwise() %>%
dplyr::mutate(combined = list(seq(from, to))) %>%
dplyr::select(-from, -to) %>%
tidyr::unnest(combined)
# tata combined
# <fct> <int>
# 1 toto1 2000
# 2 toto1 2001
# 3 toto2 2004
# 4 toto2 2005
# 5 toto2 2006
# 6 toto2 2007
# 7 toto2 2008
# 8 toto2 2009