我有一个名为/uploads/2018/1/1/pic1.jpg
的df,每个法律都有一行(每个id一个):
laws
我想从中创建第二个名为laws <- data.frame(id=c(1,2,3),beginyear=c(2001,2002,2005),endyear=c(2003,2005,2006), law1=c(0,0,1), law2=c(1,0,1))
,每个id和年份都有一行:
idyear
我如何有效地编写一些代码来获取idyear <- data.frame(id=c(rep(1,6),rep(2,6),rep(3,6)), year=(rep(c(2001:2006),3)), law1=c(rep(0,16),1,1), law2=c(1,1,1,rep(0,13),1,1))
df的idyear
df输出?如果laws
是&gt; = idyear$year
并且laws$beginyear
是&lt; = idyear$year
,则两个法律变量是指标变量== 1。
我是R的初学者,但我愿意尝试任何事情(申请,循环等)以使其发挥作用。
答案 0 :(得分:4)
1)基础 expand.grid
将创建所有id
和year
组合的18 x 2数据框,然后merge
将合并它与laws
一起回来。将law1
和law2
之间year
和beginyear
之间的任何endyear
和beginyear
条目清零。最后删除endyear
和g <- with(laws, expand.grid(year = min(beginyear):max(endyear), id = id))
m <- merge(g, laws)
m[m$year < m$beginyear | m$year > m$endyear, c("law1", "law2")] <- 0
m <- subset(m, select = - c(beginyear, endyear))
# check
identical(m, idyear)
## [1] TRUE
列。没有包使用。
library(magrittr)
laws %$%
expand.grid(year = min(beginyear):max(endyear), id = id) %>%
merge(laws) %$%
{ .[year < beginyear | year > endyear, c("law1", "law2")] <- 0; .} %>%
subset(select = - c(beginyear, endyear))
2)magrittr 这与(1)的解决方案相同,只是我们使用magrittr管道来表达它。注意管道操作符的混合。
Memcached
更新:已修复。添加了(2)。
答案 1 :(得分:3)
使用mapply
功能可以提供帮助。
# Function to expand year between begin and end
gen_data <- function(x_id, x_beginyear, x_endyear, x_law1, x_law2){
df <- data.frame(x_id, x_beginyear:x_endyear, x_law1, x_law2)
df
}
idyearlst <- data.frame()
idyearlst <- rbind(idyearlst, mapply(gen_data, laws$id, laws$beginyear,
laws$endyear, laws$law1, laws$law2))
# Finally convert list to data.frame
idyear <- setNames(do.call(rbind.data.frame, idyearlst), c("id", "year", "law1", "law2"))
Result will be like:
> idyear
id year law1 law2
V1.1 1 2001 0 1
V1.2 1 2002 0 1
V1.3 1 2003 0 1
V2.4 2 2002 0 0
V2.5 2 2003 0 0
V2.6 2 2004 0 0
V2.7 2 2005 0 0
V3.8 3 2005 1 1
V3.9 3 2006 1 1
答案 2 :(得分:3)
使用tidyverse的解决方案。最后一个as.data.frame()
是可选的,只是将tbl
转换为数据框。
library(tidyverse)
idyear <- laws %>%
mutate(year = map2(beginyear, endyear, `:`)) %>%
unnest() %>%
complete(id, year = full_seq(year, period = 1L), fill = list(law1 = 0L, law2 = 0L)) %>%
select(-beginyear, -endyear) %>%
as.data.frame()
idyear
# id year law1 law2
# 1 1 2001 0 1
# 2 1 2002 0 1
# 3 1 2003 0 1
# 4 1 2004 0 0
# 5 1 2005 0 0
# 6 1 2006 0 0
# 7 2 2001 0 0
# 8 2 2002 0 0
# 9 2 2003 0 0
# 10 2 2004 0 0
# 11 2 2005 0 0
# 12 2 2006 0 0
# 13 3 2001 0 0
# 14 3 2002 0 0
# 15 3 2003 0 0
# 16 3 2004 0 0
# 17 3 2005 1 1
# 18 3 2006 1 1
答案 3 :(得分:2)
有点丑陋的方法,但我认为它会得到你所追求的,使用G. Grothendieck的g
expand.grid
数据框作为基础,以及你的laws
数据框。 / p>
new.df <- data.frame(t(apply(g, 1, function(x){
yearspan = laws[laws$id == x['id'], 'beginyear']:laws[laws$id == x['id'], 'endyear']
law1 = laws$law1[laws$id == x['id'] & x['year'] %in% yearspan]
law2 = laws$law2[laws$id == x['id'] & x['year'] %in% yearspan]
x['law1'] = ifelse(length(law1 > 0), law1, 0)
x['law2'] = ifelse(length(law2 > 0), law2, 0)
return(x)
})))
> new.df
id year law1 law2
1 1 2001 0 1
2 1 2002 0 1
3 1 2003 0 1
4 1 2004 0 0
5 1 2005 0 0
6 1 2006 0 0
7 2 2001 0 0
8 2 2002 0 0
9 2 2003 0 0
10 2 2004 0 0
11 2 2005 0 0
12 2 2006 0 0
13 3 2001 0 0
14 3 2002 0 0
15 3 2003 0 0
16 3 2004 0 0
17 3 2005 1 1
18 3 2006 1 1
图书馆:
dplyr
(对于arrange
,并非真的有必要)
数据:
laws <- data.frame(id=c(1,2,3),
beginyear=c(2001,2002,2005),
endyear=c(2003,2005,2006),
law1=c(0,0,1), law2=c(1,0,1))
g <- with(laws, expand.grid(id = id, year = min(beginyear):max(endyear)))
g <- arrange(g, id)