我正在尝试编写一个函数来计算模式的连续实例数。举个例子,我喜欢字符串
string<-"A>A>A>B>C>C>C>A>A"
转化为
"3 A > 1 B > 3 C > 2 A"
我有一个计算每个字符串实例的函数,见下文。但它没有达到我想要的排序效果。任何想法或指针?
谢谢,
[R
现有功能:
fnc_gen_PathName <- function(string) {
p <- strsplit(as.character(string), ";")
p1 <- lapply(p, table)
p2 <- lapply(p1, function(x) {
sapply(1:length(x), function(i) {
if(x[i] == 25){
paste0(x[i], "+ ", names(x)[i])
} else{
paste0(x[i], "x ", names(x)[i])
}
})
})
p3 <- lapply(p2, function(x) paste(x, collapse = "; "))
p3 <- do.call(rbind, p3)
return(p3)
}
答案 0 :(得分:10)
如@MrFlick所评论,您可以使用rle
和strsplit
with(rle(strsplit(string, ">")[[1]]), paste(lengths, values, collapse = " > "))
## [1] "3 A > 1 B > 3 C > 2 A"
答案 1 :(得分:0)
以下是两个dplyr解决方案:一个是常规解决方案,另一个是rle解决方案。优点是:可以输入多个字符串作为向量,在(重新)重新编写之前构建一个整洁的中间数据集。
library(dplyr)
library(tidyr)
library(stringi)
strings = "A>A>A>B>C>C>C>A>A"
data_frame(string = strings) %>%
mutate(string_split =
string %>%
stri_split_fixed(">")) %>%
unnest(string_split) %>%
mutate(ID =
string_split %>%
lag %>%
`!=`(string_split) %>%
plyr::mapvalues(NA, TRUE) %>%
cumsum) %>%
count(string, ID, string_split) %>%
group_by(string) %>%
summarize(new_string =
paste(n,
string_split,
collapse = " > ") )
data_frame(string = strings) %>%
group_by(string) %>%
do(.$string %>%
first %>%
stri_split_fixed(">") %>%
first %>%
rle %>%
unclass %>%
as.data.frame) %>%
summarize(new_string =
paste(lengths, values, collapse = " > "))