每周我都是一个不完整的分析数据集。看起来像是:
df1 <- data.frame(var1 = c("a","","","b",""),
var2 = c("x","y","z","x","z"))
缺少一些var1值。数据集应该看起来像这样:
df2 <- data.frame(var1 = c("a","a","a","b","b"),
var2 = c("x","y","z","x","z"))
目前我使用Excel宏来执行此操作。但这使得分析自动化变得更加困难。从现在开始,我想在R中这样做。但我不知道该怎么做。
感谢您的帮助。
评论后的问题更新
var2与我的问题无关。我唯一想做的就是。从df1到df2。
df1 <- data.frame(var1 = c("a","","","b",""))
df2 <- data.frame(var1 = c("a","a","a","b","b"))
答案 0 :(得分:21)
这是通过使用行程编码(rle
)及其反rle.inverse
来实现此目的的一种方法:
fillTheBlanks <- function(x, missing=""){
rle <- rle(as.character(x))
empty <- which(rle$value==missing)
rle$values[empty] <- rle$value[empty-1]
inverse.rle(rle)
}
df1$var1 <- fillTheBlanks(df1$var1)
结果:
df1
var1 var2
1 a x
2 a y
3 a z
4 b x
5 b z
答案 1 :(得分:13)
这是一种更简单的方法:
library(zoo)
df1$var1[df1$var1 == ""] <- NA
df1$var1 <- na.locf(df1$var1)
答案 2 :(得分:7)
tidyr软件包具有fill()
功能,可以解决问题。
df1 <- data.frame(var1 = c("a","","","b",""))
fill(df1$var1)
答案 3 :(得分:5)
这是另一种略短的方式,不会强迫角色:
Fill <- function(x,missing="")
{
Log <- x != missing
y <- x[Log]
y[cumsum(Log)]
}
结果:
# For factor:
Fill(df1$var1)
[1] a a a b b
Levels: a b
# For character:
Fill(as.character(df1$var1))
[1] "a" "a" "a" "b" "b"
答案 4 :(得分:0)
下面是我的空缺功能,遇到同样的问题,希望对您有帮助。
unfill <- function(df,cols){
col_names <- names(df)
unchanged <- df[!(names(df) %in% cols)]
changed <- df[names(df) %in% cols] %>%
map_df(function(col){
col[col == col %>% lag()] <- NA
col
})
unchanged %>% bind_cols(changed) %>% select(one_of(col_names))
}