具有更准确的数据集示例的修订问题
我有几个不同的列表,每个列表包含许多字符。我在这里写了一个很简短的例子
List1 <- "A + B + C + D + E:F + F:E"
List2<- "A + B + C + E:F + F:E + G:H + H:G"
List3 <- "J + K + L + L:H + L:H1"
我正在尝试通过所有这些列表查找出现的频率,但是某些项目的重复会引起问题。
通过很多循环,然后Y X%in%Y被拆分(在“:”之前和之后拆分),我得到了
sig_var8
var count
1 0 0
2 A 2
3 B 2
4 C 2
5 D 1
6 E:F 2
7 F:E 2
8 G:H 1
9 H:G 1
10 J 1
11 K 1
12 L 1
13 L:H 1
14 L:H1 1
我想要的是
sig_var8
var count
1 0 0
2 A 2
3 B 2
4 C 2
5 D 1
6 E:F 2
7 G:H 1
8 J 1
9 K 1
10 L 1
11 L:H 1
12 L:H1 1
注意:在列表1中,E:F和F:E被认为是相同的,并且只出现一次。与列表2相同,其中G:H == H:G,并且仅计数一次。请注意,grep并不是最好的,因为列表3中的L:H和L:H1不同,因此需要将它们分开计数(因此%in%)。
这是我工作的代码:
sig_var8<-data.frame(matrix(data=0,nrow=1,ncol=2))
colnames(sig_var8)<-c("var","count")
sig_var8[,1]<-as.character(sig_var8[,1])
sig_var8[,2]<-as.numeric(sig_var8[,2])
for(list in 1:3){
temp_list<-get(paste0("List",list)) #get the equation above
assign(paste0("List",list,"a"), gsub(" ","",temp_list)) #remove all spaces in the sentence
assign(paste0("List",list,"a_split"), strsplit(get(paste0("List",list,"a")),"[+]")) #split where "+" are
temp_listA<-get(paste0("List",list,"a_split"))[[1]]
for (item in 1:length(temp_listA)){
if(isTRUE(temp_listA[item] %in% sig_var8[,1])){
row_n<-which(sig_var8[,1]==temp_listA[item])
sig_var8[row_n,2]<-sig_var8[row_n,2]+1
}
if(isFALSE(temp_listA[item] %in% sig_var8[,1])){
row_n<-nrow(sig_var8)
sig_var8[row_n+1,1]<-temp_listA[item]
sig_var8[row_n+1,2]<-1
}
}
}
答案 0 :(得分:3)
也许像下面这样可以满足您的需求。
Lst <- mget(ls(pattern = "^List"))
Lst <- lapply(Lst, function(x) {
L <- strsplit(x, ":")
res <- sapply(L, function(y){
paste(sort(y), collapse = ":")
})
unique(res)
})
table(unlist(Lst))
#
# A B C D E:F G:H H:L H1:L J K L
# 2 2 2 1 2 1 1 1 1 1 1
答案 1 :(得分:1)
我不是100%确定这是您要寻找的东西,但是如果是,我会对其进行注释。
List1 <- c("A","B","C","D","E:F","F:E")
List2<- c("A","B","C","E:F","F:E","G:H","H:G")
List3 <- c("J","K","L","L:H","L:H1")
Lst <- list(List1, List2, List3)
keep_me <- lapply(Lst, function(x) !duplicated(lapply(strsplit(x, ":", fixed = T), sort)))
Lst_cleaned <- unlist(Map(`[`, Lst, keep_me))
table(Lst_cleaned)
Lst_cleaned
A B C D E:F G:H J K L L:H L:H1
2 2 2 1 2 1 1 1 1 1 1
编辑:在下面添加了说明。让我知道是否仍然不清楚或遇到更多问题。我首先使用List1
来演示lapply
对每个列表元素的作用。另外,作为旁注,将其分解也使我意识到,如果您不想使用which
,则无需使用。您可以使用Map
中的逻辑向量对Lst
# Spliting the string on the colon and sorting the elements
lapply(strsplit(List1, ":", fixed = T), sort)
[[1]]
[1] "A"
[[2]]
[1] "B"
[[3]]
[1] "C"
[[4]]
[1] "D"
[[5]]
[1] "E" "F"
[[6]]
[1] "E" "F"
# Logical vector for the elements are NOT duplicated
!duplicated(lapply(strsplit(List1, ":", fixed = T), sort))
[1] TRUE TRUE TRUE TRUE TRUE FALSE
# Which gives the indices for TRUE's
which(!duplicated(lapply(strsplit(List1, ":", fixed = T), sort)))
[1] 1 2 3 4 5
# Now, all together: lapply is applying the above logic to
# each elemnt in Lst, it returns a list of the indices that are not
# duplicates for each vector
lapply(Lst, function(x) which(!duplicated(lapply(strsplit(x, ":", fixed = T), sort))))
[[1]]
[1] 1 2 3 4 5
[[2]]
[1] 1 2 3 4 6
[[3]]
[1] 1 2 3 4 5
keep_me <- lapply(Lst, function(x) which(!duplicated(lapply(strsplit(x, ":", fixed = T), sort))))
# Map subsets (`[`) Lst by the indices in keep_me, and unlist
# flattens the list (i.e., unlist makes it a vector)
Map(`[`, Lst, keep_me)
[[1]]
[1] "A" "B" "C" "D" "E:F"
[[2]]
[1] "A" "B" "C" "E:F" "G:H"
[[3]]
[1] "J" "K" "L" "L:H" "L:H1"
unlist(Map(`[`, Lst, keep_me))
[1] "A" "B" "C" "D" "E:F" "A" "B" "C" "E:F" "G:H" "J" "K" "L" "L:H" "L:H1"
答案 2 :(得分:1)
根据@Rui的回答,我认为这将满足您的要求
List1 <- c("A","B","C","D","E:F","F:E")
List2<- c("A","B","C","E:F","F:E","G:H","H:G")
List3 <- c("J","K","L","L:H","L:H1")
# make list of all objects starting with List
Lst <- mget(ls(pattern = "^List"))
# function to split, sort, and stitch the duplicates
split.sort <- function(x) {
ifelse(length(x) > 1, paste0(sort(x), collapse = ":"), x)
}
# apply function to each of the Lst lists and remove duplicates
Lst <- lapply(Lst, function(y) unique(sapply(strsplit(y, ":"), split.sort)))
# get frequency
table(unlist(Lst))
#>
#> A B C D E:F G:H H:L H1:L J K L
#> 2 2 2 1 2 1 1 1 1 1 1
由reprex package(v0.2.1)于2019-04-17创建