用多逗号分割字符串

时间:2021-05-11 14:00:12

标签: javascript regex

它是如何在文本下方拆分的?它包含逗号分隔的值,但一些内部值也有逗号。但是我们知道每个组都以 GO:XX 模式开头。

<块引用>

GO:0048193, BP, 高尔基体囊泡运输, GO:0030198, BP, 细胞外 矩阵组织,GO:0006903,BP,囊泡靶向,GO:0043062, BP,细胞外结构组织,GO:0048199,BP,囊泡 以高尔基体为目标、从高尔基体或在高尔基体内部, GO:0031012,CC,细胞外 基质,GO:0062023,CC,含胶原的细胞外基质, GO:0005581, CC, 胶原三聚体, GO:0044420, CC, 细胞外基质 组分, GO:0030020, MF, 细胞外基质结构成分 赋予抗张强度,GO:0005201,MF,细胞外基质 结构成分

我使用了这个正则表达式模式,但不适用于多逗号值:(如 GO:0048199)

let myRegexp = /(GO:[0-9]+), (BP|MF|CC), ([^,]+)/g;
let raw = "GO:0048193, BP, Golgi vesicle transport, GO:0030198, BP, extracellular matrix organization, GO:0006903, BP, vesicle targeting, GO:0043062, BP, extracellular structure organization, GO:0048199, BP, vesicle targeting, to, from or within Golgi, GO:0031012, CC, extracellular matrix, GO:0062023, CC, collagen-containing extracellular matrix, GO:0005581, CC, collagen trimer, GO:0044420, CC, extracellular matrix component, GO:0030020, MF, extracellular matrix structural constituent conferring tensile strength, GO:0005201, MF, extracellular matrix structural constituent"
let match = myRegexp.exec(raw);
while (match != null) {
      console.log(match[0].trim());
      match = myRegexp.exec(raw);
}

也许我可以用模式分割数据:GO:[0-9]+ 但我无法捕获 GO ID。这将是两步两捕获所有数据,所以它是丑陋的代码。有没有更好的解决办法?

3 个答案:

答案 0 :(得分:3)

您可以使用前瞻:

GO:\d+.*?(?=,\s+GO:|$)

a demo on regex101.com


JS 中,这可能是:

let myRegexp = /GO:\d+.*?(?=,\s+GO:|$)/g;
let raw = "GO:0048193, BP, Golgi vesicle transport, GO:0030198, BP, extracellular matrix organization, GO:0006903, BP, vesicle targeting, GO:0043062, BP, extracellular structure organization, GO:0048199, BP, vesicle targeting, to, from or within Golgi, GO:0031012, CC, extracellular matrix, GO:0062023, CC, collagen-containing extracellular matrix, GO:0005581, CC, collagen trimer, GO:0044420, CC, extracellular matrix component, GO:0030020, MF, extracellular matrix structural constituent conferring tensile strength, GO:0005201, MF, extracellular matrix structural constituent"
let match = myRegexp.exec(raw);
while (match != null) {
      console.log(match[0].trim());
      match = myRegexp.exec(raw);
}

答案 1 :(得分:2)

您可以通过正向预测来拆分字符串。

46
let raw = "GO:0048193, BP, Golgi vesicle transport, GO:0030198, BP, extracellular matrix organization, GO:0006903, BP, vesicle targeting, GO:0043062, BP, extracellular structure organization, GO:0048199, BP, vesicle targeting, to, from or within Golgi, GO:0031012, CC, extracellular matrix, GO:0062023, CC, collagen-containing extracellular matrix, GO:0005581, CC, collagen trimer, GO:0044420, CC, extracellular matrix component, GO:0030020, MF, extracellular matrix structural constituent conferring tensile strength, GO:0005201, MF, extracellular matrix structural constituent",
    result = raw.split(/,\s+(?=GO:\d+,)/);

console.log(result);

答案 2 :(得分:0)

const input = 'GO:0048193, BP, Golgi vesicle transport, GO:0030198, BP, extracellular matrix organization, GO:0006903, BP, vesicle targeting, GO:0043062, BP, extracellular structure organization, GO:0048199, BP, vesicle targeting, to, from or within Golgi, GO:0031012, CC, extracellular matrix, GO:0062023, CC, collagen-containing extracellular matrix, GO:0005581, CC, collagen trimer, GO:0044420, CC, extracellular matrix component, GO:0030020, MF, extracellular matrix structural constituent conferring tensile strength, GO:0005201, MF, extracellular matrix structural constituent'

const result = input.split('GO:00').slice(1).map(x => 'GO:00' + x)

console.log(result)