如何根据另一列突出显示一列中的顺序字符串

时间:2016-03-02 13:37:29

标签: r

我的数据是

    df <- structure(list(M1 = c(4L, 11L, 11L, 11L, 11L, 11L, 11L, 16L, 
16L, 16L, 16L, 16L, 16L, 16L), M2 = structure(c(14L, 1L, 2L, 
3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L), .Label = c(" B135", 
" B168", " B172", " B299", " B300", " B301", " B335", " B336", 
" B364", " B566", " B567", " B590", " B591", "A"), class = "factor"), 
    N = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L), N2 = c(470L, 14L, 12L, 16L, 9L, 14L, 14L, 24L, 15L, 
    32L, 193L, 76L, 10L, 9L)), .Names = c("M1", "M2", "N", "N2"
), class = "data.frame", row.names = c(NA, -14L))

数据看起来像这样

>df
#   M1    M2 N  N2
#1   4     A 1 470
#2  11  B135 1  14
#3  11  B168 1  12
#4  11  B172 1  16
#5  11  B299 1   9
#6  11  B300 1  14
#7  11  B301 1  14
#8  16  B335 1  24
#9  16  B336 1  15
#10 16  B364 1  32
#11 16  B566 1 193
#12 16  B567 1  76
#13 16  B590 1  10
#14 16  B591 1   9

我正在寻找的是检查M1并基于M1突出显示M2 我想基于类似的M1值来评估顺序 在这个例子中

#   M1    M2  N  N2
#1   4    A   1  470

因此它只有一个,我不需要突出显示它

#2  11  B135 1  14
#3  11  B168 1  12
#4  11  B172 1  16
#5  11  B299* 1   9
#6  11  B300* 1  14
#7  11  B301* 1  14

在本节中(M1的所有数据均为11)B299,B300和B301是连续的(彼此重复)所以我想用例如一个明星突出显示

#8  16  B335* 1  24
#9  16  B336* 1  15
#10 16  B364  1  32
#11 16  B566**  1 193
#12 16  B567**  1  76
#13 16  B590***  1  10
#14 16  B591***  1   9

在本节中(M1的所有值都是16)B335和B336是连续的,所以我用一颗星突出它们然后B566和B567也是连续的**星,因为它们与第一颗不同,相同的是第三顺序组等

1 个答案:

答案 0 :(得分:2)

这是一次尝试,这假设值按您的示例排序:

 highlight_seq <- function(x){
        #get sequences of numbers and get rid of NAs
        num_seq <- (diff(as.numeric(gsub("\\D", "", x)))==1)*1
        num_seq[is.na(num_seq)] <- 0

        #to figure out the number of each sequence, use rle
        num_seq <- rle(num_seq)

        #replace 1s by the cumsum
        num_seq$values[which(num_seq$values!=0)]=cumsum(num_seq$values)[which(num_seq$values!=0)]
        num_seq <- inverse.rle(num_seq)

        #since diff was initially used, add the first value of each sequence
        num_seq <- c(0,num_seq)
        num_seq[which(num_seq!=0)-1] <- num_seq[which(num_seq!=0)] 

        #paste asterisks in after the sequences
        return(paste0(x,sapply(num_seq,function(p) paste(rep("*",p),collapse=""))))
}

library(dplyr)
df %>% group_by(M1) %>% mutate(M2=highlight_seq(M2))


    M1      M2 N  N2
1   4       A 1 470
2  11    B135 1  14
3  11    B168 1  12
4  11    B172 1  16
5  11   B299* 1   9
6  11   B300* 1  14
7  11   B301* 1  14
8  16    B335 1  24
9  16   B363* 1  15
10 16   B364* 1  32
11 16  B566** 1 193
12 16  B567** 1  76
13 16  B568** 1  10
14 16  B569** 1   9