我想在data.frame中为每个组(公司和类型)的所有缺失年份创建新行。数据框如下所示:
minimal <- data.frame(firm = c("A","A","A","B","B","B","A","A","A","B","B","B"),
type = c("X","X","X","X","X","X","Y","Y","Y","Y","Y","Y"),
year = c(2000,2004,2007,2010,2008,2001,2002,2003,2007,2000,2001,2008),
value = c(1,3,7,9,9,2,3,3,7,5,9,15)
)
数据帧:
firm type year value
A X 2000 1
A X 2004 3
A X 2007 7
B X 2010 9
B X 2008 9
B X 2001 2
A Y 2002 3
A Y 2003 3
A Y 2007 7
B Y 2000 5
B Y 2001 9
B Y 2008 15
现在,我想得到的是以下内容: 我可以在数据中看到最小年份是2000年,最大值是2010年。我想为每个公司类型的组合每个缺失年份添加一行。 例如。对于公司A和类型X,我想添加行,使其看起来像这样:
最终输出:
firm type year value
A X 2000 1
A X 2004 3
A X 2007 7
A X 2001 1
A X 2002 1
A X 2003 1
A X 2005 3
A X 2006 3
A X 2008 7
A X 2009 7
A X 2010 7
此外,我想将上一年的值写入列&#39;值&#39;对于所有后续年份的缺失行,直到出现新的非缺失行(如最终输出示例中所示)。
我还没有提出任何有用的代码,但到目前为止我发现的是以下可能是正确的方向:
setDT(minimal)[, .SD[match(2000:2010, year)],
by = c("firm","type")]
我不太了解setDT和.SD的概念,但这会为每个公司类型组合创建至少一行。但是,一年中没有内容。
提前多多感谢!
答案 0 :(得分:0)
我编写了这样的代码,可以做你想做的事情,也许它不是那么高效或优雅但它有效:
# Input dataframe
minimal <- data.frame(firm = c("A","A","A","B","B","B","A","A","A","B","B","B"),
type = c("X","X","X","X","X","X","Y","Y","Y","Y","Y","Y"),
year = c(2000,2004,2007,2010,2008,2001,2002,2003,2007,2000,2001,2008),
value = c(1,3,7,9,9,2,3,3,7,5,9,15)
)
# Sorting is needed
minimal = minimal[order(minimal$firm, minimal$type, minimal$year),]
# Variables used
table = table(minimal$firm=="A", minimal$type=="X")
minYear = min(minimal$year)
maxYear = max(minimal$year)
startPos = 0
# Iterates the dataframe
for(i in 1:2){
for(j in 1:2){
prevValue = 0
currYear = minYear
# Adds minimum year if needed
if(minimal$year[1+startPos] != currYear){
newRow = c(as.character(minimal$firm[1+startPos]), as.character(minimal$type[1+startPos]), currYear, prevValue)
minimal = rbind(minimal, newRow)
}
# Adds years
for(k in (1+startPos):(table[i,j]+startPos)){
if(minimal$year[k]!=currYear){
currYear = currYear + 1
while(minimal$year[k]!=currYear){
newRow = c(as.character(minimal$firm[k]), as.character(minimal$type[k]), currYear, prevValue)
minimal = rbind(minimal, newRow)
currYear = currYear + 1
}
}
prevValue = minimal$value[k]
}
# Adds years from last to maximum
if(currYear < maxYear){
for(l in 1:(maxYear - currYear)){
newRow = c(as.character(minimal$firm[k]), as.character(minimal$type[k]), currYear+l, prevValue)
minimal = rbind(minimal, newRow)
}
}
startPos = startPos + table[i,j]
}
}
# Result
minimal = minimal[order(minimal$firm, minimal$type, minimal$year),]
minimal
答案 1 :(得分:0)
我无法找到确切的欺骗,所以这是一个可能的解决方案,
library(dplyr)
library(tidyr)
minimal %>%
group_by(firm, type) %>%
complete(year = full_seq(2000:2010, 1)) %>%
fill(value)
答案 2 :(得分:0)
这是一个data.table
解决方案。
library(data.table)
dt <- setDT(minimal)[CJ(firm=firm, type=type, year=seq(min(year), max(year)), unique=TRUE),
on=.(firm, type, year), roll=TRUE]
返回
head(dt, 15)
firm type year value
1: A X 2000 1
2: A X 2001 1
3: A X 2002 1
4: A X 2003 1
5: A X 2004 3
6: A X 2005 3
7: A X 2006 3
8: A X 2007 7
9: A X 2008 7
10: A X 2009 7
11: A X 2010 7
12: A Y 2000 NA
13: A Y 2001 NA
14: A Y 2002 3
15: A Y 2003 3
请注意,第二个公司类型组合的初始行是NA。如果要在随后的年份填写这些,可以将fill的参数调整为“nearest”,尽管这可能会影响数据中间的值。