Question

有没有办法可以使用正则表达式排除单词开头的字符，但如果字符位于单词的中间，仍会捕获字符？

示例：

string="i would like to exlucde :HOkd but not JI:jklj "

我知道那么说

re.findall('[^:]\w+',string)

将找到所有单词并排除：但我想包括：除非它位于单词的开头，即找到JI：jklj但不是：HOkd

Answer 1

[根据OP评论更新] \w不包括:，扩展了捕获其余部分的部分。使用负向lookbehind和字边界来检查单词是否不以:

开头

\b(?<!:)\w[\w:]+

Demo

\b字边界，以排除来自否定前瞻的单词中的:
(?<!:)以避免匹配以:
\w[\w:]+以字母数字字符开头，然后包含字母数字或:

Answer 2

['i', 'would', 'like', 'to', 'exlucde', 'but', 'not', 'JI:jklj']

输出：

{{1}}

Answer 3

See regex in use here

                        # Below, a represents one or more word characters
(?<!:)\b\w+(?::\w+)?    # Accepts formats a or a:a
(?<!:)\b\w+(?::\w+)*    # Same as above but allows a:a:a
(?<!:)\b[\w:]+\b        # Similar to above but allows a:a:a and a::a

(?<!:)负面的背后隐藏确保前面的内容不是:
\b断言位置为单词边界
\w+匹配一个或多个单词字符
(?::\w+)?可选地匹配冒号字符后跟一个或多个单词字符

See code in use here

import re

r = re.compile(r"(?<!:)\b\w+(?::\w+)?")
s = "i would like to exlucde :HOkd but not JI:jklj "

print(r.findall(s))

Answer 4

你不需要这个（简化？）例子的正则表达式，列表理解会这样做：

string = "i would like to exlucde :HOkd but not JI:jklj " 

filtered = " ".join(
    [word 
    for word in string.split() 
    if not word.startswith(':')])
print(filtered)

这会产生

i would like to exlucde but not JI:jklj

正则表达式，不包括单词开头的字符

4 个答案: