Question

我想删除之后的空格，括号和字符。例如，

你好（hi） - ＆gt;喂

你好（hi） - ＆gt;喂

你好（hi）bonjour - ＆gt;喂

（hi）hello bonjour - ＆gt; （嗨）你好bonjour

（hi）_hello - ＆gt; （HI）_hello

我已经成功地完成了空格和括号的删除，但是当它出现在单词的开头时我无法阻止它。

re.sub("\s*\(.+", "", "hello(hi)") # 'hello' re.sub("\s*\(.+", "", "(hi)_hello") # '', NOT desirable re.sub("\w+\s*\(.+", "", "hello(hi)") # '', NOT desirable re.sub("\w+\s*\(.+", "", "(hi)_hello") # '(hi)_hello'

我也查阅了一些关于负向前瞻的文件，但到目前为止还无法得到它。

感谢任何帮助。

Answer 1

您可以使用带有负面反馈的正则表达式。

cases = [
    'hello (hi)', 
    'hello(hi)', 
    'hello (hi) bonjour', 
    '(hi) hello bonjour', 
    '(hi)_hello'
]

>>> [re.sub(r'(?<!^)\s*\(.*', '', i) for i in cases]
['hello', 'hello', 'hello', '(hi) hello bonjour', '(hi)_hello']

<强>详情

(?<!   # negative lookbehind
^      # (do not) match the start of line
)     
\s*    # 0 or more spaces
\(     # literal parenthesis
.*     # match 0 or more characters (greedy)

Answer 2

你需要一个负面的后视：(?<!^)。 (?<!...)是负面的背后隐藏。如果您在比赛剩余时间之前看到...，则表示不匹配。

在这种情况下，您不希望在案例开头匹配，因此...将为^。即：

re.sub("(?<!^)\s*\(.+", "", "(hi)_hello") # (hi_hello)

如果在行的开头和第一个括号之间只有空格，它仍会替换文本：

re.sub("(?<!^)\s*\(.+", "", "  (hi)_hello") # ' '

Answer 3

我不知道你是否必须使用正则表达式，但因为你使用Python它也可以这样做：

lines = ["(hi) hello", "hello (hi)", "hello (hi) hello"]

for line in lines:
    result = line.split("(hi)")
    if(result[0] == ""):
        print(line)
    else:
        print(result[0])

当模式存在时匹配字符串，除非从它开始

3 个答案: