我有一个缺少行的数据框。您可以通过查看序列中的间隙来识别缺失的行。
Count<-c(1,1,1,1,2,2,2,3,3,4,4,4,4,5,5,6,6,6)
Seq<-c(1,2,3,4,1,2,4,1,4,1,2,3,5,1,3,1,2,3)
MyData<-c(5,4,5,3,4,3,2,1,2,1,3,2,4,2,3,1,4,3)
DF1<-data.frame(Count,Seq,MyData)
DF1
计数跟踪序列号,序列将始终作为数字序列运行。在这种情况下,它是1:5,但这可能会有所不同,所以我不想硬编码这个限制。
我的目标是创建两个包含所有缺失序列行的新数据框。第一个在数据列中有NA,用于添加&#34;缺失&#39;行。
Count2<-c(1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,4,4,4,4,4,5,5,5,5,5,6,6,6,6,6)
Seq2<-c(1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,1,2,3,4,5)
MyData2<-c(5,4,5,3,NA,4,3,NA,2,NA,1,NA,NA,2,NA,1,3,2,NA,4,2,NA,3,NA,NA,1,4,3,NA,NA)
DF2<-data.frame(Count2,Seq2,MyData2)
DF2
第二个数据帧类似,但包含该序列号的最后一个已知数据点。
Count2<-c(1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,4,4,4,4,4,5,5,5,5,5,6,6,6,6,6)
Seq2<-c(1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,1,2,3,4,5)
MyData3<-c(5,4,5,3,NA,4,3,5,2,NA,1,3,5,2,NA,1,3,2,2,4,2,3,3,2,4,1,4,3,2,4)
DF3<-data.frame(Count2,Seq2,MyData3)
DF3
注意 - 由于没有初始值,因此前3个计数的序列中的第5个元素的NA仍然存在。
答案 0 :(得分:2)
使用dplyr
和tidyr
的解决方案。
library(dplyr)
library(tidyr)
DF2 <- DF1 %>%
complete(Count, Seq = full_seq(Seq, period = 1)) %>%
arrange(Count, Seq)
DF3 <- DF2 %>%
arrange(Seq, Count) %>%
group_by(Seq) %>%
fill(MyData) %>%
arrange(Count) %>%
ungroup()
DF2
# # A tibble: 30 x 3
# Count Seq MyData
# <dbl> <dbl> <dbl>
# 1 1 1 5
# 2 1 2 4
# 3 1 3 5
# 4 1 4 3
# 5 1 5 NA
# 6 2 1 4
# 7 2 2 3
# 8 2 3 NA
# 9 2 4 2
# 10 2 5 NA
# # ... with 20 more rows
DF3
# # A tibble: 30 x 3
# Count Seq MyData
# <dbl> <dbl> <dbl>
# 1 1 1 5
# 2 1 2 4
# 3 1 3 5
# 4 1 4 3
# 5 1 5 NA
# 6 2 1 4
# 7 2 2 3
# 8 2 3 5
# 9 2 4 2
# 10 2 5 NA
# # ... with 20 more rows
答案 1 :(得分:1)
以下是使用
的解决方案merge
上的基础R dataframe
,zoo::na.locf
用第二个问题的最后已知值替换NA
。根据OP的请求,从数据中推断出最大Seq
和Count
值。
# These are the maximum seq and count numbers from the data
maxSeq <- max(DF1$Seq);
maxCts <- max(DF1$Count);
# Replicating DF1
# Construct "skeleton" dataframe with appropriate Seq and Count sequences
df.one <- data.frame(
Count = rep(seq(1:maxCts), each = maxSeq),
Seq = rep(seq(1:maxSeq), maxCts)
);
# Merge with source data, and put NAs for missing entries
df.one <- merge(df.one, DF1, all = TRUE);
tail(df.one)
# Count Seq MyData
#25 5 5 NA
#26 6 1 1
#27 6 2 4
#28 6 3 3
#29 6 4 NA
#30 6 5 NA
# Replicating DF2
# Split on Seq, replace NAs in MyData with last known value,
# and rbind into dataframe
df.two <- do.call(rbind.data.frame, lapply(split(df.two, df$Seq), function(x) {
x$MyData <- na.locf(x$MyData);
return(x);
}))
# Sort by Count then Seq
df.two <- df.two[order(df.two$Count, df.two$Seq), ];
rownames(df.two) <- NULL;
tail(df.two);
# Count Seq MyData
#25 5 5 3
#26 6 1 1
#27 6 2 4
#28 6 3 3
#29 6 4 3
#30 6 5 2
Count<-c(1,1,1,1,2,2,2,3,3,4,4,4,4,5,5,6,6,6)
Seq<-c(1,2,3,4,1,2,4,1,4,1,2,3,5,1,3,1,2,3)
MyData<-c(5,4,5,3,4,3,2,1,2,1,3,2,4,2,3,1,4,3)
DF1<-data.frame(Count,Seq,MyData)