使用R中的列表进行搜索和编码

时间:2014-05-02 12:35:17

标签: string r list dataframe

我有一个'列表' '向量'字符串和' data.frame'字符串如下

lst <- list( c("key", "parking", "velvet"), c("sumatra", "cap"), c("sled", "card"), c("notice", "piece", "page"))

df <-  c("key", "sumatra", "band", "cattle", "camp", "sled", "page", "wire", "key", "card", "cap", "page")
df <- data.frame(df, stringsAsFactors=FALSE)

我想将列添加到数据框df,其代码基于列表lst中向量的成员资格。所需的输出是这样的。

df$code <- c("G1", "G2", "", "", "", "G3", "G4", "", "G1", "G3", "G2", "G4")

 df
        df code
1      key   G1
2  sumatra   G2
3     band     
4   cattle     
5     camp     
6     sled   G3
7     page   G4
8     wire     
9      key   G1
10    card   G3
11     cap   G2
12    page   G4

我如何在R中执行此操作?

5 个答案:

答案 0 :(得分:2)

df$code <- paste0("G",cumsum(c(TRUE, diff(sequence(sapply(lst,length)))<0)))[match(df$df, unlist(lst))]
df$code[is.na(df$code)] <- ''

答案 1 :(得分:1)

这是一种方式:

names(lst) <- paste0('G', seq_along(lst))
transform(df, code=with(stack(lst), ind[match(df, values)]))
#         df code
# 1      key   G1
# 2  sumatra   G2
# 3     band <NA>
# 4   cattle <NA>
# 5     camp <NA>
# 6     sled   G3
# 7     page   G4
# 8     wire <NA>
# 9      key   G1
# 10    card   G3
# 11     cap   G2
# 12    page   G4

答案 2 :(得分:1)

以下是使用qdapTools包的方法:

library(qdapTools)
names(lst) <- paste0("G", 1:length(lst))
df$code <- df[, 1] %l% lst

答案 3 :(得分:1)

还有一个好的措施......

lst <- list(c("key", "parking", "velvet"), c("sumatra", "cap"), 
            c("sled", "card"), c("notice", "piece", "page"))
d <- c("key", "sumatra", "band", "cattle", "camp", 
        "sled", "page", "wire", "key", "card", "cap", "page")
DF <- data.frame(d, stringsAsFactors=FALSE)

> l <- rep(seq_along(lst), sapply(lst, length))
> m <- l[match(d, unlist(lst))]
> DF$code <- ifelse(is.na(m), "", paste0("G", m))
> DF
##         df code
## 1      key   G1
## 2  sumatra   G2
## 3     band     
## 4   cattle     
## 5     camp     
## 6     sled   G3
## 7     page   G4
## 8     wire     
## 9      key   G1
## 10    card   G3
## 11     cap   G2
## 12    page   G4

答案 4 :(得分:0)

我假设您从这样的代码开始:

 MyCode <- c("G1", "G2","G3", "G4", "G1", "G3", "G2", "G4")

但你需要知道要放入哪些行。试试这个:

df$code<-NA
df[df$df %in% unlist(lst),]$code<-MyCode

unlist()部分会将您的列表转换为矢量。 %in%部分将返回df$dflst中的内容匹配的任何行。如果没有匹配项,NA下会有df$code