我创建了一个像这样的空数据框
id Alyr Crub Lala Brap Bole Spar Esal Aara Thas
1 XLOC_003940_TBH_1 NA NA NA NA NA NA NA NA NA
我想看看id
和列名是否匹配,然后它应该用某个值替换“NA”。这是一个例子:
ex1 <- "Alyr_XLOC_003940_TBH_1_Ortholog_Known_Gene_Sense"
sp <- sub("([A-Za-z]+)_(XLOC_\\d+_TBH_1)_([A-Za-z_]+)","\\1", ex1)
gene <- sub("([A-Za-z]+)_(XLOC_\\d+_TBH_1)_([A-Za-z_]+)","\\2", ex1)
fun <- sub("([A-Za-z]+)_(XLOC_\\d+_TBH_1)_([A-Za-z_]+)","\\3", ex1)
基于上面的例子,我想得到类似这样的东西
id Alyr Crub Lala Brap Bole Spar Esal Aara Thas
1 XLOC_003940_TBH_1 Ortholog_Known_Gene_Sense NF NF NF NF NF NF NF NF
我被困在这里,无法想象我该怎么做?
答案 0 :(得分:1)
使用矩阵子集:
df1$id <- gene
df1[cbind(1:nrow(df1), match(sp, names(df1)))] <- fun
Check this answer了解有关按两列矩阵对数据帧进行子集化的更多信息。
##Example
nms <- scan(what="character", text="id Alyr Crub Lala Brap Bole Spar Esal Aara Thas")
df1 <- as.data.frame(matrix(NA, 3, 10))
names(df1) <- nms
df1
# id Alyr Crub Lala Brap Bole Spar Esal Aara Thas
#1 NA NA NA NA NA NA NA NA NA NA
#2 NA NA NA NA NA NA NA NA NA NA
#3 NA NA NA NA NA NA NA NA NA NA
ex1 <- c("Alyr_XLOC_003940_TBH_1_Ortholog_Gene",
"Lala_XLOC_1234_TBH_1_Lalala_Gene",
"Thas_XLOC_5678_TBH_1_Thasthas_Gene")
sp <- sub("([A-Za-z]+)_(XLOC_\\d+_TBH_1)_([A-Za-z_]+)","\\1", ex1)
gene <- sub("([A-Za-z]+)_(XLOC_\\d+_TBH_1)_([A-Za-z_]+)","\\2", ex1)
fun <- sub("([A-Za-z]+)_(XLOC_\\d+_TBH_1)_([A-Za-z_]+)","\\3", ex1)
df1$id <- gene
df1[cbind(1:nrow(df1), match(sp, names(df1)))] <- fun
df1
# id Alyr Crub Lala Brap Bole Spar Esal Aara Thas
# 1 XLOC_003940_TBH_1 Ortholog_Gene NA <NA> NA NA NA NA NA <NA>
# 2 XLOC_1234_TBH_1 <NA> NA Lalala_Gene NA NA NA NA NA <NA>
# 3 XLOC_5678_TBH_1 <NA> NA <NA> NA NA NA NA NA Thasthas_Gene