Question

我有一个数据框中的列，我想在第5个分隔符之前删除部分字符串＆＃34;。＆＃34;和最后的＆＃34;。＆＃34;对于.txt，我不知道该怎么做。

输入：

jhu-usc.edu_GBM.HumanMethylation450.6.lvl-3.TCGA-06-5415-01A-01D-1481-05.txt
jhu-usc.edu_BCD.HumanMethylation450.6.lvl-3.TCGA-06-5415-01A-01D-1482-05.txt
jhu-usc.edu_LGG.HumanMethylation450.6.lvl-3.TCGA-06-5415-01A-01D-1483-05.txt
jhu-usc.edu_LUAD.HumanMethylation450.6.lvl-3.TCGA-06-5415-01A-01D-1484-05.txt
jhu-usc.edu_LUAD.HumanMethylation450.6.lvl-3.TCGA-06-5415-01A-01D-1485-05.txt
jhu-usc.edu_BRCA.HumanMethylation450.6.lvl-3.TCGA-06-5415-01A-01D-1486-05.txt
jhu-usc.edu_GBM.HumanMethylation450.6.lvl-3.TCGA-06-5415-01A-01D-1487-05.txt
jhu-usc.edu_PRCA.HumanMethylation450.6.lvl-3.TCGA-06-5415-01A-01D-1488-05.txt

期望的输出：

TCGA-06-5415-01A-01D-1481-05
TCGA-06-5415-01A-01D-1482-05
TCGA-06-5415-01A-01D-1483-05
TCGA-06-5415-01A-01D-1484-05
TCGA-06-5415-01A-01D-1485-05
TCGA-06-5415-01A-01D-1486-05
TCGA-06-5415-01A-01D-1487-05
TCGA-06-5415-01A-01D-1488-05

我试过了： sapply（strsplit（as.character（df $ V1），＆＃34;。＆＃34;），＆＃39; [＆＃39;，1：5）

请指教。谢谢。

Answer 1

假设文本已修复

sub(".*(TCGA[^.]+).*", "\\1", str1)

Answer 2

如果它们都以.txt结尾，那么你可以

sub(".+\\.([^.]+).txt", "\\1", as.character(df$V1))

R在分隔符之前删除字符串

2 个答案: