Question

我正在尝试查找所有看起来像abc_rty或abc_45或abc09_23k或abc09-K34或4535的令牌。代币不应以_或-或数字开头。

我没有取得任何进展，甚至失去了我所取得的进步。这就是我现在所拥有的：

r'(?<!0-9)[(a-zA-Z)+]_(?=a-zA-Z0-9)|(?<!0-9)[(a-zA-Z)+]-(?=a-zA-Z0-9)\w+'

为了使问题更清楚，这里有一个例子：如果我有一个字符串如下：

    D923-44 43 uou 08*) %%5 89ANB -iopu9 _M89 _97N hi_hello

然后它将接受

    D923-44 and 43 and uou and hi_hello

应该忽略

    08*) %%5 89ANB -iopu9 _M89 _97N

我可能错过了一些案例，但我认为文本就足够了。道歉，如果不是

Answer 1

这似乎符合要求：

regex = re.compile(r"""
    (?<!\S)   # Assert there is no non-whitespace before the current character
    (?:       # Start of non-capturing group:
     [^\W\d_] # Match either a letter
     [\w-]*   # followed by any number of the allowed characters
    |         # or
     \d+      # match a string of digits.
    )         # End of group
    (?!\S)    # Assert there is no non-whitespace after the current character""", 
    re.VERBOSE)

在regex101.com上查看。

Answer 2

^(\d+|[A-Za-z][\w_-]*)$

Regular expression visualization

Edit live on Debuggex

用空格分隔符拆分行，然后通过该行运行此REGEX进行过滤。

^是该行的开头
\d表示数字[0-9]
+表示一个或多个
|表示OR
[A-Za-z]第一个字符必须是字母
[\w_-]*之后可以有任何字母数字_ +字符或根本没有字符。
$表示该行的结尾

REGEX的流程显示在我提供的图表中，这在某种程度上解释了它是如何发生的。

然而，生病解释基本上它检查它是否是所有数字或它以字母（上/下）开头然后在该字母后它检查任何字母数字_ +字符直到行的结尾。

Python正则表达式提取令牌

2 个答案: