比较data.frame

时间:2017-09-04 20:45:17

标签: r

我导入R的原始数据有3列:(I)名称,(II)统计, (III)#Cells

名称列中给出的名称是详细的,例如:

01Sep17 Trm diffn_Tube_001.fcs/Lymphocytes/Live/CD8a subset/integrin B7 subset    
01Sep17 Trm diffn_Tube_003.fcs/Lymphocytes/Live/CD4 subset/CD103 subset
01Sep17 Trm diffn_Tube_004.fcs/Lymphocytes/Live/CD4 subset/CD73 subset

(table not shown as there are several hundred rows). 

为简化此列,我希望比较名称并删除每个样本之间共享的部分(使用Tube_0 *。*。通配符)。例如,以上3应该成为:

CD8a subset/integrin B7 subset
CD4 subset/CD103 subset
CD4 subset/CD73 subset

有关如何实现这一目标的任何建议?我不想使用

01Sep17 Trm diffn_Tube_0*.*.fcs/Lymphocytes/Live

在变量内部并使用

as.data.frame(sapply(NameofDataFrame,gsub,pattern=VariableName,replacement=""))

因为确切的名称会在实验之间发生变化

2 个答案:

答案 0 :(得分:2)

您可以使用 gsub

gsub("(.*)+Live+/","",x)

[1] "CD8a subset/integrin B7 subset" "CD4 subset/CD103 subset"       
[3] "CD4 subset/CD73 subset"

示例数据:

x <- c(
"01Sep17 Trm diffn_Tube_001.fcs/Lymphocytes/Live/CD8a subset/integrin B7 subset",
"01Sep17 Trm diffn_Tube_003.fcs/Lymphocytes/Live/CD4 subset/CD103 subset",
"01Sep17 Trm diffn_Tube_004.fcs/Lymphocytes/Live/CD4 subset/CD73 subset")

答案 1 :(得分:0)

使用stringr

library(stringr)

x <- c(
  "01Sep17 Trm diffn_Tube_001.fcs/Lymphocytes/Live/CD8a subset/integrin B7 subset",
  "01Sep17 Trm diffn_Tube_003.fcs/Lymphocytes/Live/CD4 subset/CD103 subset",
  "01Sep17 Trm diffn_Tube_004.fcs/Lymphocytes/Live/CD4 subset/CD73 subset")

str_match(x, '.*/Live/(.*)')[,2]