我有一个文件名向量和一个数据框,其中包含每个文件名的“组”名称。
files <- c("data/backup/LATEST/20181514.X1235",
"data/backup/LATEST/X1255+20181514",
"data/backup/LATEST/20181514-X1237",
"data/backup/LATEST/20181514-E1235",
"data/backup/LATEST/20181514F1235",
"data/backup/LATEST/M32_-X6635__20181514",
"data/backup/LATEST/20181514-X1205",
"data/backup/LATEST/l-A1230.20181514-XX")
groups <- data.frame(
ID = c("X1235","X1255","A1230","K93430",
"LOP0343","J3490","X1205","X6635",
"F1235","E1235","X1237"),
Group = c("A","A","A",
"B","A","A",
"B","B","B",
"B","A")
)
作为最终结果,我想要一个数据框,其中的一列包含来自files
的完整文件路径,第二列显示其group
。
我该如何实现?
结果
filepath group
1 data/backup/LATEST/20181514.X1235 A
2 data/backup/LATEST/X1255+20181514 A
3 data/backup/LATEST/20181514-X1237 A
4 data/backup/LATEST/20181514-E1235 B
5 data/backup/LATEST/20181514F1235 B
6 data/backup/LATEST/M32_-X6635__20181514 B
7 data/backup/LATEST/20181514-X1205 B
8 data/backup/LATEST/l-A1230.20181514-XX A
答案 0 :(得分:2)
这是使用stringr::str_detect
library(stringr)
strdet <- function(x){
#browser()
groups[str_detect(x,groups$ID),'Group']
}
apply(df, 1, strdet)
[1] "A" "A" "A" "B" "B" "B" "B" "A"
PS:
stringAsFactor=FALSE
读取了df
df <- data.frame(files, stringsAsFactors = FALSE)
答案 1 :(得分:0)
使用基数R,您可以使用以下方法创建group
向量:
group_list <- lapply(groups$ID,
function(patt) groups$Group[which(grepl(patt, files))])
data.frame(files=files, group=unlist(group_list))
files group
data/backup/LATEST/20181514.X1235 A
data/backup/LATEST/X1255+20181514 A
data/backup/LATEST/20181514-X1237 B
data/backup/LATEST/20181514-E1235 B
data/backup/LATEST/20181514F1235 A
data/backup/LATEST/M32_-X6635__20181514 A
data/backup/LATEST/20181514-X1205 B
data/backup/LATEST/l-A1230.20181514-XX A
您正在寻找什么吗?
答案 2 :(得分:0)
如果您可以假设ID字符串的构建方式(一个字母,四个数字)以及tidverse:
data.frame(file=files) %>%
mutate(ID=str_extract(file,"[A-Z]\\d{4}")) %>%
left_join(groups,by="ID")
我在创建组时添加了stringsAsFactors=FALSE
,以避免出现警告。
如果不能:
library(fuzzyjoin)
data.frame(file=files,stringsAsFactors=FALSE) %>%
fuzzy_left_join(groups, by=list(x="file",y="ID"), match_fun=str_detect)