它是如何在文本下方拆分的?它包含逗号分隔的值,但一些内部值也有逗号。但是我们知道每个组都以 GO:XX
模式开头。
GO:0048193, BP, 高尔基体囊泡运输, GO:0030198, BP, 细胞外 矩阵组织,GO:0006903,BP,囊泡靶向,GO:0043062, BP,细胞外结构组织,GO:0048199,BP,囊泡 以高尔基体为目标、从高尔基体或在高尔基体内部, GO:0031012,CC,细胞外 基质,GO:0062023,CC,含胶原的细胞外基质, GO:0005581, CC, 胶原三聚体, GO:0044420, CC, 细胞外基质 组分, GO:0030020, MF, 细胞外基质结构成分 赋予抗张强度,GO:0005201,MF,细胞外基质 结构成分
我使用了这个正则表达式模式,但不适用于多逗号值:(如 GO:0048199)
let myRegexp = /(GO:[0-9]+), (BP|MF|CC), ([^,]+)/g;
let raw = "GO:0048193, BP, Golgi vesicle transport, GO:0030198, BP, extracellular matrix organization, GO:0006903, BP, vesicle targeting, GO:0043062, BP, extracellular structure organization, GO:0048199, BP, vesicle targeting, to, from or within Golgi, GO:0031012, CC, extracellular matrix, GO:0062023, CC, collagen-containing extracellular matrix, GO:0005581, CC, collagen trimer, GO:0044420, CC, extracellular matrix component, GO:0030020, MF, extracellular matrix structural constituent conferring tensile strength, GO:0005201, MF, extracellular matrix structural constituent"
let match = myRegexp.exec(raw);
while (match != null) {
console.log(match[0].trim());
match = myRegexp.exec(raw);
}
也许我可以用模式分割数据:GO:[0-9]+
但我无法捕获 GO ID。这将是两步两捕获所有数据,所以它是丑陋的代码。有没有更好的解决办法?
答案 0 :(得分:3)
您可以使用前瞻:
GO:\d+.*?(?=,\s+GO:|$)
在 JS
中,这可能是:
let myRegexp = /GO:\d+.*?(?=,\s+GO:|$)/g;
let raw = "GO:0048193, BP, Golgi vesicle transport, GO:0030198, BP, extracellular matrix organization, GO:0006903, BP, vesicle targeting, GO:0043062, BP, extracellular structure organization, GO:0048199, BP, vesicle targeting, to, from or within Golgi, GO:0031012, CC, extracellular matrix, GO:0062023, CC, collagen-containing extracellular matrix, GO:0005581, CC, collagen trimer, GO:0044420, CC, extracellular matrix component, GO:0030020, MF, extracellular matrix structural constituent conferring tensile strength, GO:0005201, MF, extracellular matrix structural constituent"
let match = myRegexp.exec(raw);
while (match != null) {
console.log(match[0].trim());
match = myRegexp.exec(raw);
}
答案 1 :(得分:2)
您可以通过正向预测来拆分字符串。
46
let raw = "GO:0048193, BP, Golgi vesicle transport, GO:0030198, BP, extracellular matrix organization, GO:0006903, BP, vesicle targeting, GO:0043062, BP, extracellular structure organization, GO:0048199, BP, vesicle targeting, to, from or within Golgi, GO:0031012, CC, extracellular matrix, GO:0062023, CC, collagen-containing extracellular matrix, GO:0005581, CC, collagen trimer, GO:0044420, CC, extracellular matrix component, GO:0030020, MF, extracellular matrix structural constituent conferring tensile strength, GO:0005201, MF, extracellular matrix structural constituent",
result = raw.split(/,\s+(?=GO:\d+,)/);
console.log(result);
答案 2 :(得分:0)
const input = 'GO:0048193, BP, Golgi vesicle transport, GO:0030198, BP, extracellular matrix organization, GO:0006903, BP, vesicle targeting, GO:0043062, BP, extracellular structure organization, GO:0048199, BP, vesicle targeting, to, from or within Golgi, GO:0031012, CC, extracellular matrix, GO:0062023, CC, collagen-containing extracellular matrix, GO:0005581, CC, collagen trimer, GO:0044420, CC, extracellular matrix component, GO:0030020, MF, extracellular matrix structural constituent conferring tensile strength, GO:0005201, MF, extracellular matrix structural constituent'
const result = input.split('GO:00').slice(1).map(x => 'GO:00' + x)
console.log(result)