我导入R的原始数据有3列:(I)名称,(II)统计, (III)#Cells
名称列中给出的名称是详细的,例如:
01Sep17 Trm diffn_Tube_001.fcs/Lymphocytes/Live/CD8a subset/integrin B7 subset
01Sep17 Trm diffn_Tube_003.fcs/Lymphocytes/Live/CD4 subset/CD103 subset
01Sep17 Trm diffn_Tube_004.fcs/Lymphocytes/Live/CD4 subset/CD73 subset
(table not shown as there are several hundred rows).
为简化此列,我希望比较名称并删除每个样本之间共享的部分(使用Tube_0 *。*。通配符)。例如,以上3应该成为:
CD8a subset/integrin B7 subset
CD4 subset/CD103 subset
CD4 subset/CD73 subset
有关如何实现这一目标的任何建议?我不想使用
01Sep17 Trm diffn_Tube_0*.*.fcs/Lymphocytes/Live
在变量内部并使用
as.data.frame(sapply(NameofDataFrame,gsub,pattern=VariableName,replacement=""))
因为确切的名称会在实验之间发生变化
答案 0 :(得分:2)
您可以使用 gsub :
gsub("(.*)+Live+/","",x)
[1] "CD8a subset/integrin B7 subset" "CD4 subset/CD103 subset"
[3] "CD4 subset/CD73 subset"
示例数据:
x <- c(
"01Sep17 Trm diffn_Tube_001.fcs/Lymphocytes/Live/CD8a subset/integrin B7 subset",
"01Sep17 Trm diffn_Tube_003.fcs/Lymphocytes/Live/CD4 subset/CD103 subset",
"01Sep17 Trm diffn_Tube_004.fcs/Lymphocytes/Live/CD4 subset/CD73 subset")
答案 1 :(得分:0)
使用stringr
:
library(stringr)
x <- c(
"01Sep17 Trm diffn_Tube_001.fcs/Lymphocytes/Live/CD8a subset/integrin B7 subset",
"01Sep17 Trm diffn_Tube_003.fcs/Lymphocytes/Live/CD4 subset/CD103 subset",
"01Sep17 Trm diffn_Tube_004.fcs/Lymphocytes/Live/CD4 subset/CD73 subset")
str_match(x, '.*/Live/(.*)')[,2]