查找列中连续短语的外观

时间:2019-06-18 10:59:47

标签: r dplyr

我有以下数据框架,描述了从当前状态(“ 0”)到5年后的情况。

df = structure(list(Prog0to1 = c("different", "different", "same", 
"different", "disappeared", "different", "same", "same", "different", 
"different"), Prog1to2 = c("disappeared", "disappeared", "disappeared", 
"different", "different", "different", "different", "same", "same", 
"Deceased"), Prog2to3 = c("disappeared", "different", "disappeared", 
"same", "disappeared", "same", "different", "different", "disappeared", 
"Deceased"), Prog3to4 = c("different", "same", "disappeared", 
"same", "disappeared", "same", "disappeared", "same", "disappeared", 
"Deceased"), Prog4to5 = c("same", "same", "disappeared", "different", 
"disappeared", "different", "disappeared", "same", "disappeared", 
"Deceased")), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-10L))

在df中,“相同”表示上一年的状态与当年的状态相同。

我想为每一行计算一个连续的“相同”的次数,如果有两个这样的时间段,以连续出现“相同”的次数的中位数为准。

所以输出向量应该是:

v = c(1, 2, 1, 2, 0, 2, 1, 2, 1, 0),

其中第8个值是2到2之间的中位数,即2(两个周期的“相同”由“不同”分隔)。

我该如何实现?

2 个答案:

答案 0 :(得分:1)

我们可以将applyrle一起使用,并取median中的lengths,其中values"same"

vals <- apply(df == "same", 1, function(x) median(with(rle(x), lengths[values])))
vals
#[1]  1  2  1  2 NA  2  1  2  1 NA

如果要用0代替NA s

replace(vals, is.na(vals), 0)
#[1] 1 2 1 2 0 2 1 2 1 0

答案 1 :(得分:1)

我们可以使用melt中的data.table

library(data.table)
melt(setDT(df, keep.rownames = TRUE), id.var = 'rn')[,
     rleid(value == "same") * (value == "same"), .(rn)][V1 != 0, median(.N), .(rn)]