Question

我从新颖的网站上提取了大量的帖子，他们使用此缩写作为内容和章节：v5c91。所以在这里，我们有Volume 5和Chapter 91。

以下是标题的一些示例：

$string = 'hello v2c19 lorem';
$string = 'hello v2 c19 lorem';
$string = 'hello c19 lorem';
$string = 'v8 hello c19 lorem';
$string = 'hello lorem v01';

我可以使用哪些正则表达式将这些示例中的内容和章节引出？所以我最终得到类似v8c19的东西。

Answer 1

为避免匹配标题中带有v {num}和c {num}的标题，我认为您想要这样的东西：

(\bc\d+)|\bv\d+(c\d+)将捕获章节，而(\bv\d+)|\bc\d+(v\d+)将捕获卷

编辑：要捕获c2.5之类的局部章节，只需将\d+替换为经过修饰的正则表达式that captures floating points (?:[0-9]*[.])?[0-9]+

它先查找单词边界，后跟字母（c或v），然后是数字，或者在v1c3的情况下，它寻找正确的前缀，后跟匹配项。

以下是一些示例：

const inputs = [
  'hello v2c19 lorem',
  'hello v2.5 c19 lorem',
  'hello c19 lorem',
  'v8 hello c19 lorem',
  'hello lorem c01',
  'novolume nav123',
  'hello noch123pter',
];

const find = (str, regex) => {
  let res = null;
  const match = regex.exec(str);
  if (match) {
    res = match[1] || match[2];
  }
  return res;
};
const FLOAT = `(?:[0-9]*[.])?[0-9]+`;
const vRE = new RegExp(`(\\bv${FLOAT})|\\bc${FLOAT}(v${FLOAT})`);
const cRE = new RegExp(`(\\bc${FLOAT})|\\bv${FLOAT}(c${FLOAT})`);
const output = inputs.map((title) => {
  const chapter = find(title, cRE);
  const volume = find(title, vRE);
  return {
    title,
    chapter,
    volume
  };
});

console.log(output);

可以将它们组合成仅章，仅卷，章空间卷，卷章等的所有组合...但是这很快就会使人感到困惑，而这些正则表达式就足够简单了。

正则表达式确定音量/章节

1 个答案: