Question

我试图从以下示例中提取一些数据：

姓名789,10-mill 12-27b
制造商XY-2822,10 mill，17-25b
其他制造商16b部分
另一制造商FER M9000,11工程，11-40
18b Part
制造商11-31,10 mill
制造商1x或2x;最大尺寸1x（34b），2x（38 / 24b）
制造商REC6 15/18 / 26b。广场。
制片人FC-40 11-13-16-19-22-25-27-30-34b

我希望我的结果分别是：

12,27
17,25
16
11,40
18
11-31
34,38,24（可选，如果仅提供后两者，则罚款）
15,18,26
11,13,16,19,22,25,27,30,34

我很高兴使用表达式语法在多次传递中执行此操作，但我认为这不会真的有用。

我无法使用前瞻和后视来获取数据并排除“11-mill”和“XY-2822”之类的内容。我发现的事情是我能够排除那些匹配，但最终会为其他匹配截断好的结果。

最好的方法是什么？

我现在的正则表达式是 /(?:(\d+)[b\b\/-])([b\d\b]*)[^a-z]/i

捕获字母'b'（没关系）但在最后一个例子中没有捕获34b

Answer 1

不确定您的具体要求/格式是什么，但您可以尝试：

/(?:\G(?!^)[-\/]|^(?:.*[^\d\/-])?)\K\d++(?![-\/]\D)/

http://rubular.com/r/WJqcCNe2pr

细节：

# two possible starts:
(?: # next occurrences
    \G    # anchor for the position after the previous match
    (?!^) # not at the start of the line
    [-\/]
  | # first occurrence
    ^
    (?:.*[^\d\/-])? # (note the greedy quantifier here,
                    #  to obtain the last result of the line)
)

\K # discards characters matched before from the whole match
\d++ # several digits with a possessive quantifier to forbid backtracking
(?![-\/]\D) # not followed by an hyphen of a slash and a non-digit

如果将(?:.*[^\d\/-])?替换为[^-\d\/\n]*+(?>[-\d\/]+[^-\d\/\n]+)* ，则可以改进模式（如果逐行工作，请删除\n。）。此更改的目标是限制回溯（按原子组发生原子组，而不是第一个版本的逐个字符）。

也许，你可以用这种积极的先行取代否定前瞻：(?=[-\/]\d|b|$)

其他版本here。

Answer 2

也许这就是：

(?<=\d-)\d+|\d+(?=-\d+)|\d+(?=(?:\/\d+)*b)

https://regex101.com/r/nR3eS9/1

复杂正则表达式，PEG或多次通过？

2 个答案: