Question

我最近开始与R合作，我正在尝试找到解决以下问题的方法：

我有一个data.frame有几列。其中一个包含文件名，包含所需的所有信息。示例：“13_07_26_SpeciesA_Genotype22_Column1Row2”

我想使用名称中的信息创建新列。例如，带有“22”的基因型列，带有“2”的行列等等。

我可以单独使用grepl和gsub执行此操作，如下所示：

 files <- c("13_12_26_Species_Genotype22_Column1Row2", 
       "15_12_26_Species_Genotype01_Column2Row5")  
weights <- c(20,40)           
spreadsheet <- data.frame(files,weights)  
GT22 <- grepl("Genotype22", spreadsheet$files)    
spreadsheet$GT <- gsub("TRUE","22",GT22)

但我必须检查来自不同日期的许多文件中的> 1000基因型等。所以我尝试将载体与所有可能的基因型进行比较，例如

 gt.list <- paste("Genotype",01:1000,sep="")

使用电子表格$ files列，使用match()或apply()等功能。但是我无法让它运行起来。基因型不是有序的所以我想比较“files”列的每个单元格和我的vector中的所有条目，然后在新列中写下所有匹配项（... 22,01，...）。我可以为不同的信息重写这个函数。

我将不胜感激任何帮助！

Answer 1

DF <- data.frame(
  do.call(rbind,strsplit(files,'_',fixed=T)),
  weights,
  stringsAsFactors=FALSE)
DF$GT <- substr(DF[,5],9,nchar(DF[,5]))
DF$Row <- do.call(rbind,strsplit(DF[,6],'Row',fixed=T))[,2]

#   X1 X2 X3      X4         X5          X6 weights GT Row
# 1 13 12 26 Species Genotype22 Column1Row2      20 22   2
# 2 15 12 26 Species Genotype01 Column2Row5      40 01   5

我不是正则表达式。

从一列到另一列获取信息

1 个答案: