我有一个带有数字数据行的数据框,我想计算每行中连续非空值的数量,并采用均值作为以下示例。
## Example data
dd <- data.frame(v1 = NA, v2 = 1, v3 = 2, v4 = 3, v5 = NA, v6 = NA, v7 = 5,
v8 = 4, v9 = NA, v10 = NA, v11= NA, v12 = 6, v13 = 9, v14 = 7,
v15 = 10)
x2 <- c(0, 1, 2, 3, NA, 1, 5, 4, NA, NA, 6, 6, 9, 7,NA)
dd <- rbind(dd, x2)
rownames(dd) <- c("id1","id2")
我想要创建的规则(&#34; id1&#34;的示例)是:
#positions for v2, v3 and v4 = 3 non-null values
#positions for v7 and v8 = 2 non-null values
#positions for v12, v13, v14 and v15 = 4 non-null values
最终结果
id1_non_nulls_mean = (3 + 2 + 4)/3 = 3
如果有任何帮助,非常感谢!
答案 0 :(得分:3)
这应该这样做:
> dd
v1 v2 v3 v4 v5 v6 v7 v8 v9 v10 v11 v12 v13 v14 v15
id1 NA 1 2 3 NA NA 5 4 NA NA NA 6 9 7 10
id2 0 1 2 3 NA 1 5 4 NA NA 6 6 9 7 NA
> apply(dd, 1, function(x) {r = rle(!is.na(x)); mean(r$lengths[r$values])})
id1 id2
3.000000 3.666667
修改
使用理查德的建议使其更简单,更易读:
apply(dd, 1, function(x) with(rle(!is.na(x), mean(lengths[values])))
答案 1 :(得分:0)
以下是重塑形式的方法。
library(tidyr)
library(dplyr)
dd %>%
add_rownames %>%
gather(variable, value, -rowname) %>%
group_by(rowname) %>%
mutate(group =
value %>% is.na %>% `!` %>%
`&`(value %>% lag %>% is.na) %>%
cumsum) %>%
filter(value %>% is.na %>% `!`) %>%
count(rowname, group) %>%
summarize(average_n = mean(n))