我有以下数据框架,描述了从当前状态(“ 0”)到5年后的情况。
df = structure(list(Prog0to1 = c("different", "different", "same",
"different", "disappeared", "different", "same", "same", "different",
"different"), Prog1to2 = c("disappeared", "disappeared", "disappeared",
"different", "different", "different", "different", "same", "same",
"Deceased"), Prog2to3 = c("disappeared", "different", "disappeared",
"same", "disappeared", "same", "different", "different", "disappeared",
"Deceased"), Prog3to4 = c("different", "same", "disappeared",
"same", "disappeared", "same", "disappeared", "same", "disappeared",
"Deceased"), Prog4to5 = c("same", "same", "disappeared", "different",
"disappeared", "different", "disappeared", "same", "disappeared",
"Deceased")), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-10L))
在df中,“相同”表示上一年的状态与当年的状态相同。
我想为每一行计算一个连续的“相同”的次数,如果有两个这样的时间段,以连续出现“相同”的次数的中位数为准。
所以输出向量应该是:
v = c(1, 2, 1, 2, 0, 2, 1, 2, 1, 0),
其中第8个值是2到2之间的中位数,即2(两个周期的“相同”由“不同”分隔)。
我该如何实现?
答案 0 :(得分:1)
我们可以将apply
与rle
一起使用,并取median
中的lengths
,其中values
是"same"
。
vals <- apply(df == "same", 1, function(x) median(with(rle(x), lengths[values])))
vals
#[1] 1 2 1 2 NA 2 1 2 1 NA
如果要用0代替NA
s
replace(vals, is.na(vals), 0)
#[1] 1 2 1 2 0 2 1 2 1 0
答案 1 :(得分:1)
我们可以使用melt
中的data.table
library(data.table)
melt(setDT(df, keep.rownames = TRUE), id.var = 'rn')[,
rleid(value == "same") * (value == "same"), .(rn)][V1 != 0, median(.N), .(rn)]